GGR274 midterm test#

Read this section before starting the test

Test Instructions#

  • Complete all six questions below.

  • The answers to the questions will be submitted on MarkUs using a similar workflow to the lab and homework assignments, except that MarkUs will not give you feedback on passing or failing the autotests.

  • Answers where you are asked to write python code will be autograded, and written answers will be graded manually by the teaching team.

  • NB: the student-facing autotesting on MarkUs is not as thorough as for a lab or homework. When you submit and then run the tests, it only checks that you:

    • have all the right variable names, and

    • the values the names refer to have the expected types.

  • After the tests are submitted, we will run another autotester to check that the values are correct.

Marking Rubric#

Section

0

1

2

3

python computation steps

auto test fails

auto test passes

NA

NA

Describe what you did to the data (for each part)

No answer

A partial description is given that explains what the python code did to the data

A full description that uses data science terminology is given that explains what the python code did to the data and why this step is important

NA

Conclusion (for each part)

No answer

The question is answered but no explanation is given

The question is answered but the explanation is not supported or weakly supported by the data

The question is answered and the explanation is supported by the data

The total number of marks for this test is 34.

Aids Allowed and Academic Integrity#

  • You are allowed to use any materials from the course or any other written sources (e.g., books, websites).

  • You are not allowed to directly receive or give help during the test period. In other words, all work must be your own, and you must not discuss or post any information about this test with anyone during the test period.

  • As a student, you alone are responsible for ensuring the integrity of your work and for understanding what constitutes an academic offense.

Time Allowed#

  • The test will be available at 09:00 AM on Tuesday, February 13, and must be submitted on MarkUs (see Submission Instructions) by 11:05 AM on Tuesday, February 13.

  • Late tests will receive a grade of zero unless you have an approved accommodation from your instructor.

How do I ask a question during the test?#

  • The teaching team will be available during the test on a zoom class link in case you have any questions during the midterm.

  • The class discussion forum on Piazza will be disabled during the test period.

Submission Instructions#

(Not available for the practice test)

  1. Download this notebook using menu item File —> Download As —> Notebook (.ipynb). Save it as GGR274_Midterm.ipynb.

  2. Log in here: https://markus-ds.teach.cs.toronto.edu (Tip: Control/Command-click to open it in a new tab so you can still see these instructions.)

  3. Choose your course.

  4. Click the mt: Midterm assessment.

  5. Click the Submissions tab. The new page is mt: Submissions.

  6. Click button Upload File on the bottom right.

  7. Click button Choose Files.

  8. Select the GGR274_Midterm.ipynb file that you downloaded, then click Save.

Introduction#

In this midterm, you will use data from the Statistics Canada Time Use dataset to explore how time spent on childcare for children 14 years old or younger and feeling rushed differs for two age groups: 35-44 year olds and 55-64 year olds.

The code blocks and questions below will guide you through this analysis.

Question 1#

Complete the steps below. (6 marks)

Step 1a#

We will use your student number as data for this midterm. Complete the assignment statement below by typing your student number as an int. (1 mark)

# Answer: delete student number to create handout
stnum = 
# check that stnum is type int

assert type(stnum) == int

Step 1b#

Run the Code cell below to obtain your random seed for Step 3.

NB: Be careful not to modify the Code cell below.

(1 mark)

#
# CAUTION: Don't modify this code cell
#

n = len(str(stnum))
seed = int(str(stnum)[n-3:n])

Step 2#

Read CSV file gss_tu2016_main_file.csv into a DataFrame named time_use_alldata. (1 mark)

# put your answer in this cell
# Check that the data makes sense
time_use_alldata.head()

Step 3#

Run the code cell below. This code uses pandas.DataFrame.sample to take a random sample of 75% of the rows from time_use_alldata, and names the resulting DataFrame time_use_data. (1 mark)

#
# CAUTION: Don't modify this code cell
#

time_use_data = time_use_alldata.sample(frac = 0.75, random_state = seed)
# Check that you have a dataframe with the correct shape

expected_shape = (13042, 350)

error_msg = 'Go back and run the previous cells.'

assert expected_shape == time_use_data.shape, error_msg

Step 4#

Briefly describe what you did to the data when you followed the steps for Question 1. Put your answer in a markdown cell below. (2 marks)

Step 4: Edit this Markdown cell to provide your answer.

Question 2#

Complete the steps below. (6 marks)

Step 1#

Create a list of these columns from time_use_data and name it important_columns:

  • 'agegr10': Age group of respondent in groups of 10

  • 'sex': Sex of respondent

  • 'gtu_110': Feel rushed

  • 'chh0014c': Children in household 0-14 years old

  • 'dur27': Duration Care of household children (<15 years old)

(1 mark)

# put your answer in this cell

Step 2#

Use important_columns to create a new DataFrame from time_use_data and name this new DataFrame sub_time_use_data. (1 mark)

# put your answer in this cell
# Check your work
sub_time_use_data.head()

Step 3: Create new names for columns#

In this step you will create a dictionary of column names that will be used in the next step.

Create a dictionary named columnnames that can be used to rename the columns in sub_time_use_data to make them easier to interpret. To save you time typing or cutting-and-pasting the dictionary has been partially completed in the cell below.

Here are the old name —> new name mappings and a description of the data in each column. For example in the first mapping, the old name is 'agegr10' and the new name is 'age_group'.

'agegr10' -> 'age_group': Age group of respondent (groups of 10)

           VALUE  LABEL
               1  15 to 24 years
               2  25 to 34 years
               3  35 to 44 years
               4  45 to 54 years
               5  55 to 64 years
               6  65 to 74 years
               7  75 years and over
              96  Valid skip
              97  Don't know
              98  Refusal
              99  Not stated

           Data type: numeric
           Missing-data codes: 96-99
           
'sex' -> 'sex': Sex of respondent

           VALUE  LABEL
               1  Male
               2  Female
               6  Valid skip
               7  Don't know
               8  Refusal
               9  Not stated

           Data type: numeric
           Missing-data codes: 6-9

'gtu_110' -> 'feel_rushed': General time use - Feel rushed

           VALUE  LABEL
               1  Every day
               2  A few times a week
               3  About once a week
               4  About once a month
               5  Less than once a month
               6  Never
              96  Valid skip
              97  Don't know
              98  Refusal
              99  Not stated

           Data type: numeric
           Missing-data codes: 96-99

'chh0014c' -> 'number_kids': Child(ren) in household - 0 to 14 years

           VALUE  LABEL
               0  None
               1  One
               2  Two
               3  Three or more
               6  Valid skip
               7  Don't know
               8  Refusal
               9  Not stated

           Data type: numeric
           Missing-data codes: 6-9

'dur27' -> 'childcare_minutes': Duration - Care of household child (<15) - Personal Care

           VALUE  LABEL
               0  No time spent doing this activity
            9996  Valid skip
            9997  Don't know
            9998  Refusal
            9999  Not stated

           Data type: numeric
           Missing-data codes: 9996-9999

Below is a dictionary where we have provided all the keys. Edit the dictionary to add the appropriate values, and don’t change the order of the dictionary keys. (1 mark)

columnnames = {"agegr10" : , 
               "sex": ,
               "gtu_110": ,
               "chh0014c": ,
               "dur27": }
# Check your work
columnnames

Step 4#

Use columnnames from Step 3 to rename the columns of sub_time_use_data, and name the new DataFrame sub_time_use_rename. (1 mark)

# put your answer in this cell
# Check your work
sub_time_use_rename.head()

Step 5#

Briefly describe what you did to the data when you followed the Steps for Question 2. Put your answer in a markdown cell below. (2 marks) Put your answer in a markdown cell below. (2 marks)

Step 4: Edit this Markdown cell to provide your answer.

Question 3#

Complete the steps below. (3 marks)

Step 1#

Convert the column 'childcare_minutes'—minutes spent on childcare—in sub_time_use_rename to childcare hours by dividing the values in 'childcare_minutes' by 60. The transformed values should assigned to a new column in sub_time_use_rename named 'childcare_hours'. (1 marks)

# put your answer in this cell
# Check your work
sub_time_use_rename.head()

Step 2#

Briefly describe what you did to the data when you followed the steps for Question 3. Why does it make sense to convert minutes to hours when measuring time spent on childcare? Put your answer in a markdown cell below. (2 marks)

Step 2: Edit this Markdown cell to provide your answer.

Question 4#

Complete the steps below. (4 marks)

Step 1#

Use sub_time_use_rename to:

  • Create a Boolean Series for younger adults that is True if a respondent is 35-44 and does NOT have missing values (96, 97, 98, 99) in column 'feel_rushed'. Name this series younger_adults. (1 mark)

  • Create a Boolean Series for older adults that is True if a respondent is 55-64 and does NOT have missing values (96, 97, 98, 99) in column 'feel_rushed'. Name this series older_adults. (1 mark)

# put your answer here
# Check that the value counts make sense.
print(younger_adults.value_counts())
print(older_adults.value_counts())

Step 2#

Briefly describe what you did to the data when you followed Step 1 for Question 4. Why does it make sense to convert minutes to hours when measuring time spent on childcare? Put your answer in a markdown cell below. (2 marks)

Step 2: Edit this Markdown cell to provide your answer.

Question 5#

Complete the steps below. (6 marks)

Step 1#

Use the pandas function describe to describe the distribution of:

  • column 'childcare_hours' in sub_time_use_rename for younger_adults. Name the distribution younger_adults_summary_stats. (1 mark)

  • column 'childcare_hours' in sub_time_use_rename for older_adults. Name the distribution older_adults_summary_stats. (1 mark)

# put your answer here
# Check your work
younger_adults_summary_stats
# Check your work
older_adults_summary_stats

Step 2#

Use the pandas functions describe and groupby to describe the distributions of the 'childcare_hours' column in sub_time_use_rename for these two subsets of respondents:

(a) younger_adults by 'feel_rushed'. Name this distribution younger_adults_feelrush_dist. (1 mark)

# put your answer is this cell
# Check your work
younger_adults_feelrush_dist

(b) older_adults by 'feel_rushed'. Name this distribution older_adults__feelrush_dist. (1 mark)

# put your answer is this cell
# Check your work

older_adults__feelrush_dist

Step 3#

Briefly describe what you did to the data when you followed the Steps for Question 5. Put your answer in a markdown cell below. (2 marks)

Step 3: Edit this Markdown cell to provide your answer.

Question 6: Draw conclusions#

Answer the following questions (9 marks):

(a) Do younger adults or older adults spend more time caring for children at home? Briefly explain and provide evidence, based on the analysis you just completed. (3 marks)

Edit this cell and insert your answer for (a)

(b) A recent article in the Canadian media reported that, “younger and older adults who spend less time caring for children feel rushed less often”. Based on your analysis above do you agree or disagree? Use the results of your data analysis to support your answer. (3 marks)

Edit this cell and insert your answer for (b)

(c) Based on what you understand about when people typically have younger children living with them, does your answer to (b) make sense? Explain why in 3 to 4 sentences. (3 marks)

Edit this cell and insert your answer for (c)