GGR274 midterm test#
Read this section before starting the test
Test Instructions#
Complete all six questions below.
The answers to the questions will be submitted on MarkUs using a similar workflow to the lab and homework assignments, except that MarkUs will not give you feedback on passing or failing the autotests.
Answers where you are asked to write python code will be autograded, and written answers will be graded manually by the teaching team.
NB: the student-facing autotesting on MarkUs is not as thorough as for a lab or homework. When you submit and then run the tests, it only checks that you:
have all the right variable names, and
the values the names refer to have the expected types.
After the tests are submitted, we will run another autotester to check that the values are correct.
Marking Rubric#
Section |
0 |
1 |
2 |
3 |
---|---|---|---|---|
python computation steps |
auto test fails |
auto test passes |
NA |
NA |
Describe what you did to the data (for each part) |
No answer |
A partial description is given that explains what the python code did to the data |
A full description that uses data science terminology is given that explains what the python code did to the data and why this step is important |
NA |
Conclusion (for each part) |
No answer |
The question is answered but no explanation is given |
The question is answered but the explanation is not supported or weakly supported by the data |
The question is answered and the explanation is supported by the data |
The total number of marks for this test is 34.
Aids Allowed and Academic Integrity#
You are allowed to use any materials from the course or any other written sources (e.g., books, websites).
You are not allowed to directly receive or give help during the test period. In other words, all work must be your own, and you must not discuss or post any information about this test with anyone during the test period.
As a student, you alone are responsible for ensuring the integrity of your work and for understanding what constitutes an academic offense.
Time Allowed#
The test will be available at 09:00 AM on Tuesday, February 13, and must be submitted on MarkUs (see Submission Instructions) by 11:05 AM on Tuesday, February 13.
Late tests will receive a grade of zero unless you have an approved accommodation from your instructor.
How do I ask a question during the test?#
The teaching team will be available during the test on a zoom class link in case you have any questions during the midterm.
The class discussion forum on Piazza will be disabled during the test period.
Submission Instructions#
(Not available for the practice test)
Download this notebook using menu item
File —> Download As —> Notebook (.ipynb)
. Save it asGGR274_Midterm.ipynb
.Log in here: https://markus-ds.teach.cs.toronto.edu (Tip: Control/Command-click to open it in a new tab so you can still see these instructions.)
Choose your course.
Click the
mt: Midterm
assessment.Click the
Submissions
tab. The new page ismt: Submissions
.Click button
Upload File
on the bottom right.Click button
Choose Files
.Select the
GGR274_Midterm.ipynb
file that you downloaded, then click Save.
Introduction#
In this midterm, you will use data from the Statistics Canada Time Use dataset to explore how time spent on childcare for children 14 years old or younger and feeling rushed differs for two age groups: 35-44 year olds and 55-64 year olds.
The code blocks and questions below will guide you through this analysis.
Question 1#
Complete the steps below. (6 marks)
Step 1a#
We will use your student number as data for this midterm. Complete the assignment statement below by typing your student number as an int
. (1 mark)
# Answer: delete student number to create handout
stnum =
# check that stnum is type int
assert type(stnum) == int
Step 1b#
Run the Code cell below to obtain your random seed for Step 3.
NB: Be careful not to modify the Code cell below.
(1 mark)
#
# CAUTION: Don't modify this code cell
#
n = len(str(stnum))
seed = int(str(stnum)[n-3:n])
Step 2#
Read CSV file gss_tu2016_main_file.csv
into a DataFrame
named time_use_alldata
. (1 mark)
# put your answer in this cell
# Check that the data makes sense
time_use_alldata.head()
Step 3#
Run the code cell below. This code uses pandas.DataFrame.sample
to take a random sample of 75% of the rows from time_use_alldata
, and names the resulting DataFrame
time_use_data
. (1 mark)
#
# CAUTION: Don't modify this code cell
#
time_use_data = time_use_alldata.sample(frac = 0.75, random_state = seed)
# Check that you have a dataframe with the correct shape
expected_shape = (13042, 350)
error_msg = 'Go back and run the previous cells.'
assert expected_shape == time_use_data.shape, error_msg
Step 4#
Briefly describe what you did to the data when you followed the steps for Question 1. Put your answer in a markdown cell below. (2 marks)
Step 4: Edit this Markdown cell to provide your answer.
Question 2#
Complete the steps below. (6 marks)
Step 1#
Create a list of these columns from time_use_data
and name it important_columns
:
'agegr10'
: Age group of respondent in groups of 10'sex'
: Sex of respondent'gtu_110'
: Feel rushed'chh0014c'
: Children in household 0-14 years old'dur27'
: Duration Care of household children (<15 years old)
(1 mark)
# put your answer in this cell
Step 2#
Use important_columns
to create a new DataFrame
from time_use_data
and name this new DataFrame
sub_time_use_data
. (1 mark)
# put your answer in this cell
# Check your work
sub_time_use_data.head()
Step 3: Create new names for columns#
In this step you will create a dictionary of column names that will be used in the next step.
Create a dictionary named columnnames
that can be used to rename the columns in sub_time_use_data
to make them easier to interpret. To save you time typing or cutting-and-pasting the dictionary has been partially completed in the cell below.
Here are the old name —> new name mappings and a description of the data in each column. For example in the first mapping, the old name is 'agegr10'
and the new name is 'age_group'
.
'agegr10' -> 'age_group': Age group of respondent (groups of 10)
VALUE LABEL
1 15 to 24 years
2 25 to 34 years
3 35 to 44 years
4 45 to 54 years
5 55 to 64 years
6 65 to 74 years
7 75 years and over
96 Valid skip
97 Don't know
98 Refusal
99 Not stated
Data type: numeric
Missing-data codes: 96-99
'sex' -> 'sex': Sex of respondent
VALUE LABEL
1 Male
2 Female
6 Valid skip
7 Don't know
8 Refusal
9 Not stated
Data type: numeric
Missing-data codes: 6-9
'gtu_110' -> 'feel_rushed': General time use - Feel rushed
VALUE LABEL
1 Every day
2 A few times a week
3 About once a week
4 About once a month
5 Less than once a month
6 Never
96 Valid skip
97 Don't know
98 Refusal
99 Not stated
Data type: numeric
Missing-data codes: 96-99
'chh0014c' -> 'number_kids': Child(ren) in household - 0 to 14 years
VALUE LABEL
0 None
1 One
2 Two
3 Three or more
6 Valid skip
7 Don't know
8 Refusal
9 Not stated
Data type: numeric
Missing-data codes: 6-9
'dur27' -> 'childcare_minutes': Duration - Care of household child (<15) - Personal Care
VALUE LABEL
0 No time spent doing this activity
9996 Valid skip
9997 Don't know
9998 Refusal
9999 Not stated
Data type: numeric
Missing-data codes: 9996-9999
Below is a dictionary where we have provided all the keys. Edit the dictionary to add the appropriate values, and don’t change the order of the dictionary keys. (1 mark)
columnnames = {"agegr10" : ,
"sex": ,
"gtu_110": ,
"chh0014c": ,
"dur27": }
# Check your work
columnnames
Step 4#
Use columnnames
from Step 3 to rename the columns of sub_time_use_data
, and name the new DataFrame
sub_time_use_rename
. (1 mark)
# put your answer in this cell
# Check your work
sub_time_use_rename.head()
Step 5#
Briefly describe what you did to the data when you followed the Steps for Question 2. Put your answer in a markdown cell below. (2 marks) Put your answer in a markdown cell below. (2 marks)
Step 4: Edit this Markdown cell to provide your answer.
Question 3#
Complete the steps below. (3 marks)
Step 1#
Convert the column 'childcare_minutes'
—minutes spent on childcare—in sub_time_use_rename
to childcare hours by dividing the values in 'childcare_minutes'
by 60. The transformed values should assigned to a new column in sub_time_use_rename
named 'childcare_hours'
. (1 marks)
# put your answer in this cell
# Check your work
sub_time_use_rename.head()
Step 2#
Briefly describe what you did to the data when you followed the steps for Question 3. Why does it make sense to convert minutes to hours when measuring time spent on childcare? Put your answer in a markdown cell below. (2 marks)
Step 2: Edit this Markdown cell to provide your answer.
Question 4#
Complete the steps below. (4 marks)
Step 1#
Use sub_time_use_rename
to:
Create a Boolean
Series
for younger adults that isTrue
if a respondent is 35-44 and does NOT have missing values (96, 97, 98, 99) in column'feel_rushed'
. Name this seriesyounger_adults
. (1 mark)Create a Boolean
Series
for older adults that isTrue
if a respondent is 55-64 and does NOT have missing values (96, 97, 98, 99) in column'feel_rushed'
. Name this seriesolder_adults
. (1 mark)
# put your answer here
# Check that the value counts make sense.
print(younger_adults.value_counts())
print(older_adults.value_counts())
Step 2#
Briefly describe what you did to the data when you followed Step 1 for Question 4. Why does it make sense to convert minutes to hours when measuring time spent on childcare? Put your answer in a markdown cell below. (2 marks)
Step 2: Edit this Markdown cell to provide your answer.
Question 5#
Complete the steps below. (6 marks)
Step 1#
Use the pandas
function describe
to describe the distribution of:
column
'childcare_hours'
insub_time_use_rename
foryounger_adults
. Name the distributionyounger_adults_summary_stats
. (1 mark)column
'childcare_hours'
insub_time_use_rename
forolder_adults
. Name the distributionolder_adults_summary_stats
. (1 mark)
# put your answer here
# Check your work
younger_adults_summary_stats
# Check your work
older_adults_summary_stats
Step 2#
Use the pandas
functions describe
and groupby
to describe the distributions of the 'childcare_hours'
column in sub_time_use_rename
for these two subsets of respondents:
(a) younger_adults
by 'feel_rushed'
. Name this distribution younger_adults_feelrush_dist
. (1 mark)
# put your answer is this cell
# Check your work
younger_adults_feelrush_dist
(b) older_adults
by 'feel_rushed'
. Name this distribution older_adults__feelrush_dist
. (1 mark)
# put your answer is this cell
# Check your work
older_adults__feelrush_dist
Step 3#
Briefly describe what you did to the data when you followed the Steps for Question 5. Put your answer in a markdown cell below. (2 marks)
Step 3: Edit this Markdown cell to provide your answer.
Question 6: Draw conclusions#
Answer the following questions (9 marks):
(a) Do younger adults or older adults spend more time caring for children at home? Briefly explain and provide evidence, based on the analysis you just completed. (3 marks)
Edit this cell and insert your answer for (a)
(b) A recent article in the Canadian media reported that, “younger and older adults who spend less time caring for children feel rushed less often”. Based on your analysis above do you agree or disagree? Use the results of your data analysis to support your answer. (3 marks)
Edit this cell and insert your answer for (b)
(c) Based on what you understand about when people typically have younger children living with them, does your answer to (b) make sense? Explain why in 3 to 4 sentences. (3 marks)
Edit this cell and insert your answer for (c)