GGR274 Lab 7: Functions#

Logistics#

Like previous weeks, our lab grade will be based on attendance and submission of a few small tasks to MarkUs during the lab session (or by 23:59 on Thursday).

Complete the tasks in this Jupyter notebook and submit your completed file to MarkUs. Here are the instructions for submitting to MarkUs (same as last week):

  1. Download this file (Lab_7.ipynb) from JupyterHub. (See our JupyterHub Guide for detailed instructions.)

  2. Submit this file to MarkUs under the lab7 assignment. (See our MarkUs Guide for detailed instructions.)

MarkUs JupyterHub Extension (optional)#

If you haven’t already, try to submit files to MarkUs directly from JupyterHub (without needing to download them). This is optional so you can still submit your work the usual way, but if you have some time, please try it out by following the instructions on the MarkUs Guide.

Lab Instructions and Learning Objectives#

  • Learn how to use Python dictionaries to store data.

  • Learn how to use Python to create simulated means.

Accumulating information in a dictionary#

Remember in lecture we used a for loop to add up a series of numbers? And then we used a for loop to accumulate a list of means? As it turns out, you can use the same technique to make a dictionary.

Here’s how you add a key/value pair to a dictionary (this is also called “inserting”):

d = {} # this creates an empty dictionary named "d"
d["key1"] = "value1"
d
d["key2"] = "value2"
d
d["key1"] = "new_value"
d

You can accumulate a new dictionary using a for loop:

ta_to_course = {}
for name in ["Ibrahim", "Adrienne", "Asana", "Yifeng", "Yongxin"]:
    ta_to_course[name] = "GGR274"

ta_to_course
for name in ["Meng", "Alan"]:
    ta_to_course[name] = "EEB125"

ta_to_course
# use "key" to access the associated value
ta_to_course["Meng"]
import pandas as pd

time_use_data_raw = pd.read_csv("gss_tu2016_week7.csv")

time_use_dur = time_use_data_raw.loc[
    time_use_data_raw["dur44"] > 0, ["CASEID", "dur44"]]

time_use_dur.shape

Task 1#

Take 5 random samples from the column dur44 with replacement, and calculate the mean of each random sample. Set the parameter n in .sample to the first element of sample_size and store the means in a list called simulated_means_10.

Steps to create simulated means:#

  1. Create a list called sample_size with the values 10, 20.

  2. Create an empty list called simulated_means_10

  3. Create a for loop

  4. within the for loop take a random sample with replacement from the dur44 column of time_use_dur of size n=sample_size[0] with replace = True and calculate the mean of this sample. Store this value in rsample

  5. append rsample to simulated_means_10.

# add your code below

Task 2#

Create an empty dictionary called sim_dict. Add the key sample_size[0] with values simulated_means_10.

# add your code below

Task 3#

Repeat the steps in Task 1 to create simulated means for sample_size[1] and store the results in simulated_means_20.

# add your code below

Task 4#

Add sample_size[1] as a key with values simulated_means_20 to sim_dict.

# add your code below

Task 5#

Store the data in sim_dict in a pandas DataFrame called sim_df.

# add your code below