{ "cells": [ { "cell_type": "markdown", "id": "1fb91614", "metadata": {}, "source": [ "# GGR274 Lab 5: Data Transformations, Grouped Data, and Data Visualization\n", "\n", "## Logistics\n", "\n", "Like last week, our lab grade will be based on attendance and submission of a few small tasks to MarkUs during the lab session (or by 23:59 on Thursday).\n", "\n", "Complete the tasks in this Jupyter notebook and submit your completed file to [MarkUs](https://markus-ds.teach.cs.toronto.edu).\n", "Here are the instructions for submitting to MarkUs (same as last week):\n", "\n", "1. Download this file (`Lab_5.ipynb`) from JupyterHub. (See [our JupyterHub Guide](../../../guides/jupyterhub_guide.ipynb) for detailed instructions.)\n", "2. Submit this file to MarkUs under the **lab5** assignment. (See [our MarkUs Guide](../../../guides/markus_guide.ipynb) for detailed instructions.)\n", "\n", "Note: there's no autograding set up for this week's lab, but your TA will be checking that your submitted lab file is complete as part of your \"lab attendance\" grade." ] }, { "cell_type": "markdown", "id": "5b4c7de0", "metadata": {}, "source": [ "## Lab 5 Introduction" ] }, { "cell_type": "markdown", "id": "e9e3aec0", "metadata": {}, "source": [ "In this lab, you will work with a data set called `time_use_prov`. This is a data set is derived from the Statistics Canada General Social Survey's (GSS) Time Use (TU) Survey Main File, as well as a data set containing information on aggregated provincial data. This week you will plot box plots, bar graphs, and use the logical operators from Week 4 material to develop subset data sets to visualize data on.\n", "\n", "As usual, these labs are meant to facilitate your understanding of the material from lectures in a low-stakes environment. Please feel free to refer to your lecture content, collaborate with your peers, and seek out help from your TAs." ] }, { "cell_type": "markdown", "id": "1e6017c0", "metadata": {}, "source": [ "## Task 1\n", "\n", "Read CSV file `'time_use_prov.csv'` into a pandas `DataFrame` named `prov_data`." ] }, { "cell_type": "code", "execution_count": 1, "id": "7f393e0c", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/var/folders/0j/ybsv4ncn5w50v40vdh5jjlww0000gn/T/ipykernel_61212/3599942435.py:1: DeprecationWarning: \n", "Pyarrow will become a required dependency of pandas in the next major release of pandas (pandas 3.0),\n", "(to allow more performant data types, such as the Arrow string type, and better interoperability with other libraries)\n", "but was not found to be installed on your system.\n", "If this would cause problems for you,\n", "please provide us feedback at https://github.com/pandas-dev/pandas/issues/54466\n", " \n", " import pandas as pd\n" ] }, { "data": { "text/html": [ "
\n", " | Unnamed: 0 | \n", "Participant ID | \n", "Urban/Rural | \n", "Age Group | \n", "Marital Status | \n", "sex | \n", "Kids under 14 | \n", "Feeling Rushed | \n", "Sleep duration | \n", "Work duration | \n", "Prov_ab | \n", "Employment Rate | \n", "Pct house over 30 | \n", "region | \n", "Income | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "0 | \n", "10000 | \n", "1 | \n", "5 | \n", "5 | \n", "1 | \n", "0 | \n", "1 | \n", "510 | \n", "0 | \n", "MB | \n", "61.7 | \n", "11.4 | \n", "Prairies | \n", "68147.0 | \n", "
1 | \n", "1 | \n", "10009 | \n", "1 | \n", "6 | \n", "3 | \n", "1 | \n", "0 | \n", "6 | \n", "540 | \n", "0 | \n", "MB | \n", "61.7 | \n", "11.4 | \n", "Prairies | \n", "68147.0 | \n", "
2 | \n", "2 | \n", "10016 | \n", "2 | \n", "7 | \n", "1 | \n", "1 | \n", "0 | \n", "6 | \n", "660 | \n", "0 | \n", "MB | \n", "61.7 | \n", "11.4 | \n", "Prairies | \n", "68147.0 | \n", "
3 | \n", "3 | \n", "10023 | \n", "1 | \n", "6 | \n", "1 | \n", "2 | \n", "0 | \n", "3 | \n", "330 | \n", "0 | \n", "MB | \n", "61.7 | \n", "11.4 | \n", "Prairies | \n", "68147.0 | \n", "
4 | \n", "4 | \n", "10047 | \n", "2 | \n", "7 | \n", "1 | \n", "1 | \n", "0 | \n", "3 | \n", "510 | \n", "0 | \n", "MB | \n", "61.7 | \n", "11.4 | \n", "Prairies | \n", "68147.0 | \n", "