GGR274 Homework 4: Time Use Survey Data#

Logistics#

Due date: The homework is due 23:59 on Monday, February 05.

You will submit your work on MarkUs. To submit your work:

  1. Download this file (Homework_4.ipynb) from JupyterHub. (See our JupyterHub Guide for detailed instructions.)

  2. Submit this file to MarkUs under the hw4 assignment. (See our MarkUs Guide for detailed instructions.) All homeworks will take place in a Jupyter notebook (like this one). When you are done, you will download this notebook and submit it to MarkUs.

Introduction#

For this week’s homework, you will use the Statistics Canada GSS Time Use Dataset. This time, we’re going to dig into some of the well-being variables (feeling rushed) and respondent characteristic variables (how people commute to work).

Question#

The question you’re answering in this homework:

Among Canadians that live in rural communities, is it less common for people to feel rushed and take transit to work or feel rushed and not take transit to work?

Homework Instructions and Learning Objectives#

The goal of this homework is to answer the question above performing these steps:

  • Read the Time Use Dataset into a pandas DataFrame.

  • Select specific columns and rows of the DataFrame.

  • Compute the proportions of rural respondents that feel rushed and either use or don’t use public transit to commute to work.

  • Interpret the results of the analysis.

Task 1#

a) Read the time use data set stored in the csv file gss_tu2016_main_file.csv into a pandas DataFrame and store the DataFrame in a variable named time_use_data_raw.

The file is located in the same folder as the notebook.

import pandas as pd

# Write your code here

b. The columns we will need for the analysis to answer the question are:

  • CASEID: participant ID

  • luc_rst: Urban/Rural

  • gtu_110: How often does one feel rushed?

  • ctw_140c: Commute to work - Public transit

Create a new DataFrame using time_use_data_raw that only contains the four columns in the order listed above. The first column should be CASEID, the second column should be luc_rst, etc. This new DataFrame should be stored in a variable named time_use_data.

# Write your code here

c) Create a Python dictionary stored in a variable called new_column_names, that maps old column name to new column name according to the following table:

old name

new name

CASEID

case_ID

luc_rst

urban_rural

gtu_110

feeling_rushed

ctw_140c

public_transit

You’ll use this dictionary to rename the columns in part (d) below.

# Write your code here

d) Use the dictionary new_column_names created in the previous step to rename the columns of the DataFrame stored in time_use_data. Store this new DataFrame in a variable called clean_time_use_data.

# Write your code here

Task 2#

a) Use the code book for the Time Use Survey in the file gss_tu2016_codebook.txt to guide you in creating boolean variables using clean_time_use_data that correspond the the following conditions and store the results in the variable names specified below.

Condition

variable name

Commutes to work by taking public transit

transit_yes

Does not commute to work by taking public transit

transit_no

Respondent feels rushed

feeling_rushed

Lives in a rural area/small population centre

rural

Tip: go to File -> Open menu action to find the gss_tu2016_codebook.txt file in the same folder as this notebook.

# Write your code here

b) In this part of the task you will investigate the data types of one of the variables you created in the previous part.

i) Store the data type of transit_yes in a variable called transit_col_type and print the value of transit_col_type.

ii) Store the data type of values in transit_yes in a variable called transit_data_type and print the value of transit_data_type.

# Write your code here

c) Briefly explain the difference between the values of transit_yes and transit_data_type.

Answer Task 2 c) here.

Task 3#

In this section you will write a program in a series of steps to analyse the data.

Use the DataFrame clean_time_use_data and the variables that you created in Task 2 a).

The data analysis will be implemented by writing a Python program to compute two proportions that you will express as percentages (i.e., multiplying by 100).

\[{\text{Percent}_\text{Transit}} = \frac{\text{Total number of respondents in rural areas that take transit and feel rushed}}{\text{Total number of respondents in rural areas}}\times 100 \]
\[{\text{Percent}_\text{No Transit}} = \frac{\text{Total number of respondents in rural areas that do not take transit and feel rushed}}{\text{Total number of respondents in rural areas}}\times 100 \]

The program will be written in a series of steps.

a) Create a variable called total_rural that stores the total number of respondents that live in a rural area. Print the value of this variable. This is the value of: \(\text{Total number of respondents in rural areas}\) in the proportions above.

# Write your code here

b) Create a variable called rural_rush_transit that is True if a respondent has ever felt rushed AND uses public transit to work and lives in a rural area.

Then, use this variable to select rows in clean_time_use_data and then compute the number of such rows, storing the result in a variable called rural_rush_transit_num. This is the value of: \(\text{Total number of respondents in rural areas that take transit and feel rushed}\).

# Write your code here

c) Calculate the proportion of respondents in rural areas that feel rushed and use public transit to work. Store the result in a variable called rural_rush_transit_prop.

# Write your code here

d) Print the value of rural_rush_transit_prop multiplied by 100 and rounded to two decimal places with the percent character (i.e., “%”) added to the end of the proportion. This is the value of: \({\text{Percent}_\text{Transit}}\).

# Write your code here

e) Use the print function to print the following sentence:

The number of people that use transit and feel rushed is {XX}.

Fill in the value of {XX}.

# Write your code here

f) Create a variable called rural_rush_notransit that is True if a respondent has ever felt rushed AND does not use public transit to work AND lives in a rural area.

Then, use this variable to select rows in clean_time_use_data and then compute the number of such rows, storing the result in a variable called rural_rush_notransit_num.

This is the value of: \(\text{Total number of respondents in rural areas that do not take transit and feel rushed}\).

# Write your code here

g) Use rural_rush_notransit to select rows in clean_time_use_data and compute the proportion of rural respondents (i.e., rows) that feel rushed and do not take public transit. Store the proportion in a variable rural_rush_monthly_prop.

# Write your code here

h) Print the value of rural_rush_monthly_prop multiplied by 100 and rounded to two decimal places with the percent character (i.e., “%”) added to the end of the proportion. This is the value of: \({\text{Percent}_\text{No Transit}}\).

# Write your code here

h) Use the print function to print the following sentence:

The number of people that do not use transit and feel rushed is {XX}.

Fill in the value of {XX}.

# Write your code here

Task 4#

Answer the following questions.

a) Is the data analysis above sufficient to answer the original question? If yes then explain why it’s sufficient, otherwise explain what type of analysis would have provided appropriate information to help answer the question. Briefly explain your reasoning.

Answer Task 4 1) here.

b) Does the data analysis you performed above provide evidence that Canadians who live in rural areas and use public transit have a poorer mental health than those who don’t use public transit to commute to work? If the analysis doesn’t support this claim then describe an analysis that would give you evidence to evaluate this claim. Briefly explain your reasoning.

Answer Task 4 b) here.

Marking Rubric#

Section

0

1

2

3

Computational questions (for each part)

auto test fails

auto test passes

NA

NA

Qualitative questions (for each part)

No answer

The question is answered but no explanation is given

The question is answered but the explanation is not supported or weakly supported

The question is answered and the explanation is supported