GGR274 Lab 7: Functions

GGR274 Lab 7: Functions#

Logistics#

Like previous weeks, our lab grade will be based on attendance and submission of a few small tasks to MarkUs during the lab session (or by 23:59 on Thursday).

Complete the tasks in this Jupyter notebook and submit your completed file to MarkUs. Here are the instructions for submitting to MarkUs (same as last week):

Download this file (Lab_7.ipynb) from JupyterHub. (See our JupyterHub Guide for detailed instructions.)
Submit this file to MarkUs under the lab7 assignment. (See our MarkUs Guide for detailed instructions.)

MarkUs JupyterHub Extension (optional)#

If you haven’t already, try to submit files to MarkUs directly from JupyterHub (without needing to download them). This is optional so you can still submit your work the usual way, but if you have some time, please try it out by following the instructions on the MarkUs Guide.

Lab Instructions and Learning Objectives#

Learn how to use Python dictionaries to store data.
Learn how to use Python to create simulated means.

Accumulating information in a dictionary#

Remember in lecture we used a for loop to add up a series of numbers? And then we used a for loop to accumulate a list of means? As it turns out, you can use the same technique to make a dictionary.

Here’s how you add a key/value pair to a dictionary (this is also called “inserting”):

d = {} # this creates an empty dictionary named "d"
d["key1"] = "value1"
d

{'key1': 'value1'}

d["key2"] = "value2"
d

{'key1': 'value1', 'key2': 'value2'}

d["key1"] = "new_value"
d

{'key1': 'new_value', 'key2': 'value2'}

You can accumulate a new dictionary using a for loop:

ta_to_course = {}
for name in ["Ibrahim", "Adrienne", "Asana", "Yifeng", "Yongxin"]:
    ta_to_course[name] = "GGR274"

ta_to_course

{'Ibrahim': 'GGR274',
 'Adrienne': 'GGR274',
 'Asana': 'GGR274',
 'Yifeng': 'GGR274',
 'Yongxin': 'GGR274'}

for name in ["Meng", "Alan"]:
    ta_to_course[name] = "EEB125"

ta_to_course

{'Ibrahim': 'GGR274',
 'Adrienne': 'GGR274',
 'Asana': 'GGR274',
 'Yifeng': 'GGR274',
 'Yongxin': 'GGR274',
 'Meng': 'EEB125',
 'Alan': 'EEB125'}

# use "key" to access the associated value
ta_to_course["Meng"]

'EEB125'

import pandas as pd

time_use_data_raw = pd.read_csv("gss_tu2016_week7.csv")

time_use_dur = time_use_data_raw.loc[
    time_use_data_raw["dur44"] > 0, ["CASEID", "dur44"]]

time_use_dur.shape

(368, 2)

Task 1#

Take 5 random samples from the column dur44 with replacement, and calculate the mean of each random sample. Set the parameter n in .sample to the first element of sample_size and store the means in a list called simulated_means_10.

Steps to create simulated means:#

Create a list called sample_size with the values 10, 20.
Create an empty list called simulated_means_10
Create a for loop
within the for loop take a random sample with replacement from the dur44 column of time_use_dur of size n=sample_size[0] with replace = True and calculate the mean of this sample. Store this value in rsample
append rsample to simulated_means_10.

sample_size = [10, 20]

simulated_means_10 = []

for _ in range(5):
    rsample = time_use_dur['dur44'].sample(
        n=sample_size[0], replace=True).mean()
    simulated_means_10.append(rsample)

simulated_means_10

[101.5, 165.8, 98.0, 126.9, 154.5]

Task 2#

Create an empty dictionary called sim_dict. Add the key sample_size[0] with values simulated_means_10.

sim_dict = {}

sim_dict[sample_size[0]] = simulated_means_10

Task 3#

Repeat the steps in Task 1 to create simulated means for sample_size[1] and store the results in simulated_means_20.

simulated_means_20 = []

for _ in range(5):
    rsample = time_use_dur["dur44"].sample(
        n =sample_size[1], replace = True).mean()
    simulated_means_20.append(rsample)

simulated_means_20

[142.5, 180.25, 165.5, 191.5, 215.75]

Task 4#

Add sample_size[1] as a key with values simulated_means_20 to sim_dict.

sim_dict[sample_size[1]] =  simulated_means_20

Task 5#

Store the data in sim_dict in a pandas DataFrame called sim_df.

sim_df = pd.DataFrame(sim_dict)

sim_df

	10	20
0	101.5	142.50
1	165.8	180.25
2	98.0	165.50
3	126.9	191.50
4	154.5	215.75