GGR274 Lab 7: Functions#
Logistics#
Like previous weeks, our lab grade will be based on attendance and submission of a few small tasks to MarkUs during the lab session (or by 23:59 on Thursday).
Complete the tasks in this Jupyter notebook and submit your completed file to MarkUs. Here are the instructions for submitting to MarkUs (same as last week):
Download this file (
Lab_7.ipynb
) from JupyterHub. (See our JupyterHub Guide for detailed instructions.)Submit this file to MarkUs under the lab7 assignment. (See our MarkUs Guide for detailed instructions.)
Note: there’s no autograding set up for this week’s lab, but your TA will be checking that your submitted lab file is complete as part of your “lab attendance” grade.
Piloting MarkUs JupyterHub Extension (optional)#
Starting with this week’s lab and homework, we’re piloting a new way to submit files to MarkUs directly from JupyterHub (without needing to download them). This is optional so you can still submit your work the usual way, but if you have some time please try it out by following the instructions on the MarkUs Guide.
Lab Instructions and Learning Objectives#
Learn how to use Python dictionaries to store data.
Learn how to use Python to create simulated means.
Accumulating information in a dictionary#
Remember in lecture we used a for loop to add up a series of numbers? And then we used a for loop to accumulate a list of means? As it turns out, you can use the same technique to make a dictionary.
Here’s how you add a key/value pair to a dictionary (this is also called “inserting”):
d = {} # empty dictionary
d['key1'] = 'value1'
d
d['key2'] = 'value2'
d
d['key1'] = 'new_value'
d
You can accumulate a new dictionary using a for loop:
ta_to_course = {}
for name in ['Amber', 'Martin', 'Davia', 'KP', 'Ilan']:
ta_to_course[name] = 'GGR274'
ta_to_course
for name in ['Matt', 'Fiona']:
ta_to_course[name] = 'EEB125'
print(ta_to_course)
print(ta_to_course['Matt'])
import pandas as pd
time_use_data_raw = pd.read_csv('gss_tu2016_main_file.csv')
time_use_dur = time_use_data_raw[["CASEID", "dur44"]]
time_use_dur.shape
Task 1#
Take 5 random samples from the column dur44
with replacement, and calculate the mean of each random sample. Set the parameter n
in .sample
to the first element of sample_size
and store the means in a list called simulated_means10
.
Steps to create simulated means:#
Create a list called
sample_size
with the values 10, 20.Create an empty list called
simulated_means10
Create a
for
loopwithin the
for
loop take a random sample with replacement from thedur44
column oftime_use_dur
of sizen = sample_size[0]
withreplace = True
and calculate the mean of this sample. Store this value inrsample
append
rsample
tosimulated_means10
.
# add your code below
Task 2 - Store the simulated means in a dictionary#
Create an empty dictionary called sim_dict
. Add the key sample_size[0]
with values simulated_means10
.
# add your code below
Task 3 - Repeat the steps to create simulated means for sample_size[1]
.#
Store the results in simpulated_means20
.
# add your code below
Task 4 - Store the simulated means in a dictionary#
Add sample_size[1]
as a key with values simulated_means20
to sim_dict
.
# add your code below
Task 5 - Store the data in sim_dict
in a pandas
DataFrame
called sim_df
.#
# add your code below