GGR274 Lab 3: Introduction to Programming, Part 2#
Logistics#
Like last week, our lab grade will be based on attendance and submission of a few small tasks to MarkUs during the lab session (or by 23:59 on Thursday).
Complete the tasks in this Jupyter notebook and submit your completed file to MarkUs. Here are the instructions for submitting to MarkUs (same as last week):
Download this file (
Lab_3.ipynb
) from JupyterHub. (See our JupyterHub Guide for detailed instructions.)Submit this file to MarkUs under the lab3 assignment. (See our MarkUs Guide for detailed instructions.)
Note: there’s no autograding set up for this week’s lab, but your TA will be checking that your submitted lab file is complete as part of your “lab attendance” grade.
Running example: Canadian Elections#
Today we’ll work with a small dataset of Canada’s 338 electoral districts, sourced from Elections Canada and containing the names and populations of Canada’s 338 electoral ridings.
We’ve placed a copy of this dataset in the same folder as this lab notebook (you can see this in JupyterHub by going to File -> Open).
Reading file data: open
and readlines
#
Now that we’ve seen the file, let’s learn how to read the file’s contents into Python.
Formally, we do this in two steps:
Open the file.
Read the file data into Python, line by line.
# Step 1
district_file = open("ED-Canada_2016.csv", encoding="utf-8")
district_file
district_data = district_file.readlines()
district_data
Data processing#
Let’s look at just the first line from the file:
district_data[0]
There’s two annoying parts about this line:
It’s a single string, but really stores two pieces of data.
There’s a strange
\n
at the end of the string, representing a line break.
Goal: take the list district_data
and extract just the population counts, converting to int
. We’ll develop this one together!
populations = []
for line in district_data:
entries = line.split(",")
population_entry = entries[1].strip()
population_int = int(population_entry)
populations.append(population_int)
populations
Now we can compute!#
num_populations = len(populations)
total_population = sum(populations)
max_population = max(populations)
min_population = min(populations)
Task 1: Calculations#
Compute average population.
Print total, max, min population, including relevant variable in f-string
. An example f-string for number of entries f"Number of population entries: {num_populations}."
avg_population = total_population / num_populations
print(f"Number of population entries: {num_populations}.")
print(f"Sum of populations: {total_population}.")
print(f"Maximum district population: {max_population}.")
print(f"Minimum district population: {min_population}.")
print(f"Average district population: {avg_population}.")
Task 2: Extracting district names#
Next, create a empty list called district_names
. Then, use a loop and the str.split(",")
method to calculate just the name of each district, and add each name to district_names
.
(This again follows what we did lecture quite closely, so you might want to take a few minutes to review that.)
# Write your code here
district_names = []
for line in district_data:
entries = line.split(",")
name = entries[0]
district_names.append(name)
district_names
Comprehension Question: after you complete this task, try changing str.split(",")
to str.split()
and re-run the cell. What happens in this case, and why does this occur?
Task 3: Filtering names#
Compute a list of the names of electoral districts that contain the string "Toronto"
.
To do so, you should use a for loop that iterates over district_names
, and whose body contains an if statement.
## Write your code here
toronto_names = []
for name in district_names:
if "Toronto" in name:
toronto_names.append(name)
toronto_names
Finally, compute a list of tne names of electoral districts that are at least 30 characters long.
## Write your code here
long_names = []
for name in district_names:
if len(name) >= 30:
long_names.append(name)
long_names