GGR274 Lab 7: Functions#
Logistics#
Like previous weeks, our lab grade will be based on attendance and submission of a few small tasks to MarkUs during the lab session (or by 23:59 on Thursday).
Complete the tasks in this Jupyter notebook and submit your completed file to MarkUs. Here are the instructions for submitting to MarkUs (same as last week):
Download this file (
Lab_7.ipynb
) from JupyterHub. (See our JupyterHub Guide for detailed instructions.)Submit this file to MarkUs under the lab7 assignment. (See our MarkUs Guide for detailed instructions.)
MarkUs JupyterHub Extension (optional)#
If you haven’t already, try to submit files to MarkUs directly from JupyterHub (without needing to download them). This is optional so you can still submit your work the usual way, but if you have some time, please try it out by following the instructions on the MarkUs Guide.
Lab Instructions and Learning Objectives#
Learn how to use Python dictionaries to store data.
Learn how to use Python to create simulated means.
Accumulating information in a dictionary#
Remember in lecture we used a for loop to add up a series of numbers? And then we used a for loop to accumulate a list of means? As it turns out, you can use the same technique to make a dictionary.
Here’s how you add a key/value pair to a dictionary (this is also called “inserting”):
d = {} # this creates an empty dictionary named "d"
d["key1"] = "value1"
d
{'key1': 'value1'}
d["key2"] = "value2"
d
{'key1': 'value1', 'key2': 'value2'}
d["key1"] = "new_value"
d
{'key1': 'new_value', 'key2': 'value2'}
You can accumulate a new dictionary using a for loop:
ta_to_course = {}
for name in ["Ibrahim", "Adrienne", "Asana", "Yifeng", "Yongxin"]:
ta_to_course[name] = "GGR274"
ta_to_course
{'Ibrahim': 'GGR274',
'Adrienne': 'GGR274',
'Asana': 'GGR274',
'Yifeng': 'GGR274',
'Yongxin': 'GGR274'}
for name in ["Meng", "Alan"]:
ta_to_course[name] = "EEB125"
ta_to_course
{'Ibrahim': 'GGR274',
'Adrienne': 'GGR274',
'Asana': 'GGR274',
'Yifeng': 'GGR274',
'Yongxin': 'GGR274',
'Meng': 'EEB125',
'Alan': 'EEB125'}
# use "key" to access the associated value
ta_to_course["Meng"]
'EEB125'
import pandas as pd
time_use_data_raw = pd.read_csv("gss_tu2016_week7.csv")
time_use_dur = time_use_data_raw.loc[
time_use_data_raw["dur44"] > 0, ["CASEID", "dur44"]]
time_use_dur.shape
(368, 2)
Task 1#
Take 5 random samples from the column dur44
with replacement, and calculate the mean of each random sample. Set the parameter n
in .sample
to the first element of sample_size
and store the means in a list called simulated_means_10
.
Steps to create simulated means:#
Create a list called
sample_size
with the values 10, 20.Create an empty list called
simulated_means_10
Create a
for
loopwithin the
for
loop take a random sample with replacement from thedur44
column oftime_use_dur
of sizen=sample_size[0]
withreplace = True
and calculate the mean of this sample. Store this value inrsample
append
rsample
tosimulated_means_10
.
sample_size = [10, 20]
simulated_means_10 = []
for _ in range(5):
rsample = time_use_dur['dur44'].sample(
n=sample_size[0], replace=True).mean()
simulated_means_10.append(rsample)
simulated_means_10
[101.5, 165.8, 98.0, 126.9, 154.5]
Task 2#
Create an empty dictionary called sim_dict
. Add the key sample_size[0]
with values simulated_means_10
.
sim_dict = {}
sim_dict[sample_size[0]] = simulated_means_10
Task 3#
Repeat the steps in Task 1 to create simulated means for sample_size[1]
and store the results in simulated_means_20
.
simulated_means_20 = []
for _ in range(5):
rsample = time_use_dur["dur44"].sample(
n =sample_size[1], replace = True).mean()
simulated_means_20.append(rsample)
simulated_means_20
[142.5, 180.25, 165.5, 191.5, 215.75]
Task 4#
Add sample_size[1]
as a key with values simulated_means_20
to sim_dict
.
sim_dict[sample_size[1]] = simulated_means_20
Task 5#
Store the data in sim_dict
in a pandas
DataFrame
called sim_df
.
sim_df = pd.DataFrame(sim_dict)
sim_df
10 | 20 | |
---|---|---|
0 | 101.5 | 142.50 |
1 | 165.8 | 180.25 |
2 | 98.0 | 165.50 |
3 | 126.9 | 191.50 |
4 | 154.5 | 215.75 |