GGR274 Lab 3: Introduction to Programming, Part 2#

Logistics#

Like last week, our lab grade will be based on attendance and submission of a few small tasks to MarkUs during the lab session (or by 23:59 on Thursday).

Complete the tasks in this Jupyter notebook and submit your completed file to MarkUs. Here are the instructions for submitting to MarkUs (same as last week):

  1. Download this file (Lab_3.ipynb) from JupyterHub. (See our JupyterHub Guide for detailed instructions.)

  2. Submit this file to MarkUs under the lab3 assignment. (See our MarkUs Guide for detailed instructions.)

Running example: Canadian Elections#

Today we’ll work with a small dataset of Canada’s 338 electoral districts, sourced from Elections Canada and containing the names and populations of Canada’s 338 electoral ridings.

We’ve placed a copy of this dataset in the same folder as this lab notebook (you can see this in JupyterHub by going to File -> Open).

Reading file data: open and readlines#

Now that we’ve seen the file, let’s learn how to read the file’s contents into Python.

Formally, we do this in two steps:

  1. Open the file.

  2. Read the file data into Python, line by line.

# Step 1
district_file = open("ED-Canada_2016.csv", encoding="utf-8")

district_file
# Step 2
district_data = district_file.readlines()

district_data

Data processing#

Let’s look at just the first line from the file:

district_data[0]

There’s two annoying parts about this line:

  1. It’s a single string, but really stores two pieces of data.

  2. There’s a strange \n at the end of the string, representing a line break.

Goal: take the list district_data and extract just the population counts, converting to int. We’ll develop this one together!

Tip: to ensure that you understand what this code does, write a comment that explains each line of code.

populations = []

for line in district_data:
    entries = line.split(",")
    
    population_entry = entries[1].strip()
    
    population_int = int(population_entry)
    
    populations.append(population_int)
    
populations

Now we can compute!#

num_populations = len(populations)
total_population = sum(populations)
max_population = max(populations)
min_population = min(populations)

Task 1a: Let’s see what this data looks like#

Create a f-string to print the value of each of the above variables so that it is clear what they mean. An example f-string for number of population entries f"Number of population entries: {num_populations}."

# Write your code here

Task 1b: Calculate average#

Compute average population and store it in a variable called avg_population. Use the variables above to compute the average population.

# Write your code here

Task 2: Extracting district names#

Next, create a empty list called district_names. Then, use a loop and the str.split(",") method to calculate just the name of each district, and add each name to district_names.

(This again follows what we did lecture quite closely, so you might want to take a few minutes to review that.)

# Write your code here

Comprehension Question: after you complete this task, try changing str.split(",") to str.split() and re-run the cell. What happens in this case, and why does this occur?

Task 3: Filtering names#

Compute a list of the names of electoral districts that contain the string "Toronto" and store the list in a variable called toronto_districts

To do so, you should use a for loop that iterates over district_names, and whose body contains an if statement.

## Write your code here

Finally, compute a list of the names of electoral districts that are at least 30 characters long. Store this list in a variable called long_districts.

## Write your code here