{ "cells": [ { "cell_type": "markdown", "id": "bd13e811", "metadata": {}, "source": [ "# GG274 Homework 9: Bootstrap Confidence Intervals\n", "\n", "## Logistics\n", "\n", "**Due date**: The homework is due 23:59 on Monday, March 18.\n", "\n", "You will submit your work on [MarkUs](https://markus-ds.teach.cs.toronto.edu).\n", "To submit your work:\n", "\n", "1. Download this file (`Homework_9.ipynb`) from JupyterHub. (See [our JupyterHub Guide](../../../guides/jupyterhub_guide.ipynb) for detailed instructions.)\n", "2. Submit this file to MarkUs under the **hw9** assignment. (See [our MarkUs Guide](../../../guides/markus_guide.ipynb) for detailed instructions.)\n", "All homeworks will take place in a Jupyter notebook (like this one). When you are done, you will download this notebook and submit it to MarkUs.\n" ] }, { "cell_type": "markdown", "id": "8442f7a6", "metadata": {}, "source": [ "## Introduction\n", "\n", "In this homework you will construct a bootstrap confidence interval around a sample mean of time spent driving, for those people in the survey who reported ***more*** than 0 minutes of driving. " ] }, { "cell_type": "code", "execution_count": 1, "id": "498ebc38", "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "import matplotlib.pyplot as plt \n", "import numpy as np\n", "import seaborn as sns" ] }, { "cell_type": "markdown", "id": "ba9654ca", "metadata": {}, "source": [ "``` \n", "durl313 Duration - Travel - Car - Driver\n", "\n", " VALUE LABEL\n", " 0 No time spent doing this activity\n", " 9996 Valid skip\n", " 9997 Don't know\n", " 9998 Refusal\n", " 9999 Not stated\n", "\n", " Data type: numeric\n", " Missing-data codes: 9996-9999\n", " Record/columns: 1/362-364\n", "```" ] }, { "cell_type": "markdown", "id": "dd16cb9a", "metadata": {}, "source": [ "## Step 1 - Read the time use survey data into a `pandas` `DataFrame`\n", "\n", "a) The data is stored in `gss_tu2016_main_file.csv` .\n", "\n", "Use the `pandas` function `read_csv` to read the data into a `pandas` `DataFrame` named `time_use_df`. \n" ] }, { "cell_type": "code", "execution_count": 2, "id": "0e70a55c", "metadata": {}, "outputs": [], "source": [ "# Write your code below\n", "time_use_df = pd.read_csv('gss_tu2016_main_file.csv')" ] }, { "cell_type": "markdown", "id": "a9b80a4d", "metadata": {}, "source": [ "b) Use `time_use_df` to create a another `DataFrame` called `drive_time_df` that has two columns: `'CASEID', 'durl313'` (in that order)." ] }, { "cell_type": "code", "execution_count": 3, "id": "260275b5", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " | CASEID | \n", "durl313 | \n", "
---|---|---|
0 | \n", "10000 | \n", "90 | \n", "
1 | \n", "10001 | \n", "0 | \n", "
2 | \n", "10002 | \n", "30 | \n", "
3 | \n", "10003 | \n", "80 | \n", "
4 | \n", "10004 | \n", "0 | \n", "