GG274 Homework 9: Bootstrap Confidence Intervals#
Logistics#
Due date: The homework is due 23:59 on Monday, March 17.
You will submit your work on MarkUs. To submit your work:
Download this file (
Homework_9.ipynb) from JupyterHub. (See our JupyterHub Guide for detailed instructions.)Submit this file to MarkUs under the hw9 assignment. (See our MarkUs Guide for detailed instructions.) All homeworks will take place in a Jupyter notebook (like this one). When you are done, you will download this notebook and submit it to MarkUs.
Introduction#
In this homework you will construct a bootstrap confidence interval around a sample mean of time spent driving, for those people in the survey who reported more than 0 minutes of driving.
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
durl313 Duration - Travel - Car - Driver
VALUE LABEL
0 No time spent doing this activity
9996 Valid skip
9997 Don't know
9998 Refusal
9999 Not stated
Data type: numeric
Missing-data codes: 9996-9999
Record/columns: 1/362-364
Step 1 - Read the time use survey data into a pandas DataFrame#
a) The data is stored in gss_tu2016_week9.csv .
Use the pandas function read_csv to read the data into a pandas DataFrame named time_use_df.
# Write your code below
b) Use time_use_df to create a another DataFrame called drive_time_df that has two columns: 'CASEID', 'durl313' (in that order).
# Write your code below
c) Rename the column names of drive_time_df using the following table. The DataFrame with the new column names should be called drive_time_df (i.e., don’t change the name of the DataFrame).
Original column name |
New column name |
|---|---|
CASEID |
ID |
durl313 |
drv_time |
# Write your code below
Step 2 - Select only those participants who drove and create new pandas DataFrame.#
a) In this step you will select only those survey participants who drove (i.e., their drv_time value is greater than 0 and not 9996, 9997, 9998, or 9999.
First, create a pandas Series called driver where a value is True if the person drove and False if they did not.
# Write your code below
b) Create a new DataFrame subset_drive_time_df by selecting the rows in drive_time_df where the person drove.
# Write your code below
Step 3 - Calculate the mean of how much time drivers spent driving#
In this step you will compute the mean of how much time drivers spent driving and store it in a variable called drive_time_avg.
# Write your code below
Step 4 - Create a function that generates a bootstrap sample from subset_drive_time_df#
In the below cell, create a function called one_bs_mean that calculates a bootstrap sample mean called dt_bsmean_sample from subset_drive_time_df.
# Write your code below
# test your function
one_bs_mean()
Step 5 - Compute a distribution of bootstrap sample means#
a) Create an empty list called bootstrap_means and then a loop that populates this list with 10,000 bootstrap sample means (generated from calling your one_bs_mean function).
# This code sets a random seed so the code below generates the same results
# Don't change this!
np.random.seed(274)
# Write your code below
b) Plot bootstrap_means as a histogram using a color argument of cyan and edgecolor of red. Save the histogram to bootstrap_means_histogram.
# Write your code below
Step 6 - Report the 95% confidence interval of the sample mean of how much time drivers spend driving#
a) Compute the 2.5th percentile from the distribution bootstrap_means using the np.percentile(). Save the percentile to bootstrap_means_2p5_percentile.
# Write your code below
# test your code
bootstrap_means_2p5_percentile
b) Compute the 97.5th percentile from the distribution bootstrap_means using the np.percentile(). Save the percentile to bootstrap_means_97p5_percentile.
# Write your code below
# test your code
bootstrap_means_97p5_percentile
c) Complete the following sentence reporting the 95% bootstrap confidence interval, rounded to two decimal points.
Answer: A 95% bootstrap confidence interval for the sample mean of driving time for drivers is __ to __ minutes.