GGR274 Lab 4: Introduction to Data Wrangling, Part 1#

Logistics#

Like last week, our lab grade will be based on attendance and submission of a few small tasks to MarkUs during the lab session (or by 23:59 on Thursday).

Complete the tasks in this Jupyter notebook and submit your completed file to MarkUs. Here are the instructions for submitting to MarkUs (same as last week):

  1. Download this file (Lab_4.ipynb) from JupyterHub. (See our JupyterHub Guide for detailed instructions.)

  2. Submit this file to MarkUs under the lab4 assignment. (See our MarkUs Guide for detailed instructions.)

Note: there’s no autograding set up for this week’s lab, but your TA will be checking that your submitted lab file is complete as part of your “lab attendance” grade.

Task 1: Read the csv file into a DataFrame#

Read the csv file ArrestsStripSearches.csv into a pandas Dataframe called police_df related to Toronto Police Race and Identity Based Data - Arrests and Strip Searches.

The file is located in the same folder as the notebook.

import pandas as pd

# Write your code here

Task 2: Create a DataFrame of arrests in 2021#

a. Subset the tabular data by rows#

  • Create a boolean variable named Arrests_2021 computed from police_df that is True if Arrest_Year is 2021 and False otherwise.

  • Use Arrests_2021 to select rows of police_df that correspond to arrests in 2021. Save this new DataFrame in a variable called police_2021_df. Examine the head() of this DataFrame.

# Write your code here

b. Inspect columns of the tabular data#

  • Create a variable called Arrest_column_names that stores a list of the column names of police_2021_df and print the list.

# Write your code here

Task 3: Select columns from police_2021_df and examine the distributions of each column#

  • Create a new DataFrame from police_2021_df with the following columns: _id, Sex, Perceived_Race, SearchReason_PossessWeapons and assign it to the variable police_2021_raceweapons. Examine the head() of this DataFrame.

  • Use .value_counts() to compute the distributions of Sex, Perceived_Race, SearchReason_PossessWeapons. You do not need to save these values in variables, though you may do so if you want.

# Write your code here