{ "cells": [ { "cell_type": "markdown", "id": "204a9e40", "metadata": {}, "source": [ "# GGR274 Lab 4: Introduction to Data Wrangling, Part 1\n", "\n", "## Logistics\n", "\n", "Like last week, our lab grade will be based on attendance and submission of a few small tasks to MarkUs during the lab session (or by 23:59 on Thursday).\n", "\n", "Complete the tasks in this Jupyter notebook and submit your completed file to [MarkUs](https://markus-ds.teach.cs.toronto.edu).\n", "Here are the instructions for submitting to MarkUs (same as last week):\n", "\n", "1. Download this file (`Lab_4.ipynb`) from JupyterHub. (See [our JupyterHub Guide](../../../guides/jupyterhub_guide.ipynb) for detailed instructions.)\n", "2. Submit this file to MarkUs under the **lab4** assignment. (See [our MarkUs Guide](../../../guides/markus_guide.ipynb) for detailed instructions.)\n", "\n", "Note: there's no autograding set up for this week's lab, but your TA will be checking that your submitted lab file is complete as part of your \"lab attendance\" grade." ] }, { "cell_type": "markdown", "id": "535ac824", "metadata": {}, "source": [ "## Task 1: Read the csv file into a `DataFrame`\n", "\n", "Read the csv file `ArrestsStripSearches.csv` into a pandas Dataframe called `police_df` related to Toronto Police Race and Identity Based Data - Arrests and Strip Searches.\n", "\n", "_The file is located in the same folder as the notebook._" ] }, { "cell_type": "code", "execution_count": 1, "id": "da33b4b3", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/var/folders/0j/ybsv4ncn5w50v40vdh5jjlww0000gn/T/ipykernel_79053/1181357276.py:1: DeprecationWarning: \n", "Pyarrow will become a required dependency of pandas in the next major release of pandas (pandas 3.0),\n", "(to allow more performant data types, such as the Arrow string type, and better interoperability with other libraries)\n", "but was not found to be installed on your system.\n", "If this would cause problems for you,\n", "please provide us feedback at https://github.com/pandas-dev/pandas/issues/54466\n", " \n", " import pandas as pd\n" ] }, { "data": { "text/html": [ "
\n", " | _id | \n", "Arrest_Year | \n", "Arrest_Month | \n", "EventID | \n", "ArrestID | \n", "PersonID | \n", "Perceived_Race | \n", "Sex | \n", "Age_group__at_arrest_ | \n", "Youth_at_arrest__under_18_years | \n", "... | \n", "Actions_at_arrest___Resisted__d | \n", "Actions_at_arrest___Mental_inst | \n", "Actions_at_arrest___Assaulted_o | \n", "Actions_at_arrest___Cooperative | \n", "SearchReason_CauseInjury | \n", "SearchReason_AssistEscape | \n", "SearchReason_PossessWeapons | \n", "SearchReason_PossessEvidence | \n", "ItemsFound | \n", "ObjectId | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "1 | \n", "2020 | \n", "July-Sept | \n", "1005907 | \n", "6017884.0 | \n", "326622 | \n", "White | \n", "M | \n", "Aged 35 to 44 years | \n", "Not a youth | \n", "... | \n", "NaN | \n", "NaN | \n", "NaN | \n", "1.0 | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "1 | \n", "
1 | \n", "2 | \n", "2020 | \n", "July-Sept | \n", "1014562 | \n", "6056669.0 | \n", "326622 | \n", "White | \n", "M | \n", "Aged 35 to 44 years | \n", "Not a youth | \n", "... | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "2 | \n", "
2 | \n", "3 | \n", "2020 | \n", "Oct-Dec | \n", "1029922 | \n", "6057065.0 | \n", "326622 | \n", "Unknown or Legacy | \n", "M | \n", "Aged 35 to 44 years | \n", "Not a youth | \n", "... | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "3 | \n", "
3 | \n", "4 | \n", "2021 | \n", "Jan-Mar | \n", "1052190 | \n", "6029059.0 | \n", "327535 | \n", "Black | \n", "M | \n", "Aged 25 to 34 years | \n", "Not a youth | \n", "... | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "4 | \n", "
4 | \n", "5 | \n", "2021 | \n", "Jan-Mar | \n", "1015512 | \n", "6040372.0 | \n", "327535 | \n", "South Asian | \n", "M | \n", "Aged 25 to 34 years | \n", "Not a youth | \n", "... | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "5 | \n", "
5 rows × 26 columns
\n", "\n", " | _id | \n", "Arrest_Year | \n", "Arrest_Month | \n", "EventID | \n", "ArrestID | \n", "PersonID | \n", "Perceived_Race | \n", "Sex | \n", "Age_group__at_arrest_ | \n", "Youth_at_arrest__under_18_years | \n", "... | \n", "Actions_at_arrest___Resisted__d | \n", "Actions_at_arrest___Mental_inst | \n", "Actions_at_arrest___Assaulted_o | \n", "Actions_at_arrest___Cooperative | \n", "SearchReason_CauseInjury | \n", "SearchReason_AssistEscape | \n", "SearchReason_PossessWeapons | \n", "SearchReason_PossessEvidence | \n", "ItemsFound | \n", "ObjectId | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
3 | \n", "4 | \n", "2021 | \n", "Jan-Mar | \n", "1052190 | \n", "6029059.0 | \n", "327535 | \n", "Black | \n", "M | \n", "Aged 25 to 34 years | \n", "Not a youth | \n", "... | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "4 | \n", "
4 | \n", "5 | \n", "2021 | \n", "Jan-Mar | \n", "1015512 | \n", "6040372.0 | \n", "327535 | \n", "South Asian | \n", "M | \n", "Aged 25 to 34 years | \n", "Not a youth | \n", "... | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "5 | \n", "
5 | \n", "6 | \n", "2021 | \n", "Apr-June | \n", "1019145 | \n", "6060688.0 | \n", "327535 | \n", "South Asian | \n", "M | \n", "Aged 25 to 34 years | \n", "Not a youth | \n", "... | \n", "NaN | \n", "NaN | \n", "NaN | \n", "1.0 | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "6 | \n", "
6 | \n", "7 | \n", "2021 | \n", "Jan-Mar | \n", "1035445 | \n", "6053833.0 | \n", "330778 | \n", "Black | \n", "M | \n", "Aged 25 to 34 years | \n", "Not a youth | \n", "... | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "7 | \n", "
7 | \n", "8 | \n", "2021 | \n", "Jan-Mar | \n", "1050464 | \n", "6063477.0 | \n", "330778 | \n", "Black | \n", "M | \n", "Aged 25 to 34 years | \n", "Not a youth | \n", "... | \n", "NaN | \n", "NaN | \n", "NaN | \n", "1.0 | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "8 | \n", "
5 rows × 26 columns
\n", "\n", " | _id | \n", "Sex | \n", "Perceived_Race | \n", "SearchReason_PossessWeapons | \n", "
---|---|---|---|---|
3 | \n", "4 | \n", "M | \n", "Black | \n", "NaN | \n", "
4 | \n", "5 | \n", "M | \n", "South Asian | \n", "NaN | \n", "
5 | \n", "6 | \n", "M | \n", "South Asian | \n", "NaN | \n", "
6 | \n", "7 | \n", "M | \n", "Black | \n", "NaN | \n", "
7 | \n", "8 | \n", "M | \n", "Black | \n", "NaN | \n", "