GGR274 Lab 3: Introduction to Programming, Part 2

GGR274 Lab 3: Introduction to Programming, Part 2#

Logistics#

Like last week, our lab grade will be based on attendance and submission of a few small tasks to MarkUs during the lab session (or by 23:59 on Thursday).

Complete the tasks in this Jupyter notebook and submit your completed file to MarkUs. Here are the instructions for submitting to MarkUs (same as last week):

Download this file (Lab_3.ipynb) from JupyterHub. (See our JupyterHub Guide for detailed instructions.)
Submit this file to MarkUs under the lab3 assignment. (See our MarkUs Guide for detailed instructions.)

Note: there’s no autograding set up for this week’s lab, but your TA will be checking that your submitted lab file is complete as part of your “lab attendance” grade.

Running example: Canadian Elections#

Today we’ll work with a small dataset of Canada’s 338 electoral districts, sourced from Elections Canada and containing the names and populations of Canada’s 338 electoral ridings.

We’ve placed a copy of this dataset in the same folder as this lab notebook (you can see this in JupyterHub by going to File -> Open).

Reading file data: `open` and `readlines`#

Now that we’ve seen the file, let’s learn how to read the file’s contents into Python.

Formally, we do this in two steps:

Open the file.
Read the file data into Python, line by line.

# Step 1
district_file = open("ED-Canada_2016.csv", encoding="utf-8")

district_file

district_data = district_file.readlines()

district_data

Data processing#

Let’s look at just the first line from the file:

district_data[0]

There’s two annoying parts about this line:

It’s a single string, but really stores two pieces of data.
There’s a strange \n at the end of the string, representing a line break.

Goal: take the list district_data and extract just the population counts, converting to int. We’ll develop this one together!

populations = []

for line in district_data:
    entries = line.split(",")
    
    population_entry = entries[1].strip()
    
    population_int = int(population_entry)
    
    populations.append(population_int)
    
populations

Now we can compute!#

num_populations = len(populations)
total_population = sum(populations)
max_population = max(populations)
min_population = min(populations)

Task 1: Calculations#

Compute average population. Print total, max, min population, including relevant variable in f-string. An example f-string for number of entries f"Number of population entries: {num_populations}."

avg_population = total_population / num_populations

print(f"Number of population entries: {num_populations}.")
print(f"Sum of populations: {total_population}.")
print(f"Maximum district population: {max_population}.")
print(f"Minimum district population: {min_population}.")
print(f"Average district population: {avg_population}.")

Task 2: Extracting district names#

Next, create a empty list called district_names. Then, use a loop and the str.split(",") method to calculate just the name of each district, and add each name to district_names.

(This again follows what we did lecture quite closely, so you might want to take a few minutes to review that.)

# Write your code here
district_names = []

for line in district_data:
    entries = line.split(",")
    name = entries[0]
    district_names.append(name)

district_names

Comprehension Question: after you complete this task, try changing str.split(",") to str.split() and re-run the cell. What happens in this case, and why does this occur?

Task 3: Filtering names#

Compute a list of the names of electoral districts that contain the string "Toronto".

To do so, you should use a for loop that iterates over district_names, and whose body contains an if statement.

## Write your code here
toronto_names = []

for name in district_names:
    if "Toronto" in name:
        toronto_names.append(name)

toronto_names

Finally, compute a list of tne names of electoral districts that are at least 30 characters long.

## Write your code here
long_names = []

for name in district_names:
    if len(name) >= 30:
        long_names.append(name)

long_names