GGR274 Lab 7: Functions#

Logistics#

Like previous weeks, our lab grade will be based on attendance and submission of a few small tasks to MarkUs during the lab session (or by 23:59 on Thursday).

Complete the tasks in this Jupyter notebook and submit your completed file to MarkUs. Here are the instructions for submitting to MarkUs (same as last week):

  1. Download this file (Lab_7.ipynb) from JupyterHub. (See our JupyterHub Guide for detailed instructions.)

  2. Submit this file to MarkUs under the lab7 assignment. (See our MarkUs Guide for detailed instructions.)

Note: there’s no autograding set up for this week’s lab, but your TA will be checking that your submitted lab file is complete as part of your “lab attendance” grade.

Piloting MarkUs JupyterHub Extension (optional)#

Starting with this week’s lab and homework, we’re piloting a new way to submit files to MarkUs directly from JupyterHub (without needing to download them). This is optional so you can still submit your work the usual way, but if you have some time please try it out by following the instructions on the MarkUs Guide.

Lab Instructions and Learning Objectives#

  • Learn how to use Python dictionaries to store data.

  • Learn how to use Python to create simulated means.

Accumulating information in a dictionary#

Remember in lecture we used a for loop to add up a series of numbers? And then we used a for loop to accumulate a list of means? As it turns out, you can use the same technique to make a dictionary.

Here’s how you add a key/value pair to a dictionary (this is also called “inserting”):

d = {} # empty dictionary
d['key1'] = 'value1'
d
{'key1': 'value1'}
d['key2'] = 'value2'
d
{'key1': 'value1', 'key2': 'value2'}
d['key1'] = 'new_value'
d
{'key1': 'new_value', 'key2': 'value2'}

You can accumulate a new dictionary using a for loop:

ta_to_course = {}
for name in ['Amber', 'Martin', 'Davia', 'KP', 'Ilan']:
    ta_to_course[name] = 'GGR274'

ta_to_course
{'Amber': 'GGR274',
 'Martin': 'GGR274',
 'Davia': 'GGR274',
 'KP': 'GGR274',
 'Ilan': 'GGR274'}
for name in ['Matt', 'Fiona']:
    ta_to_course[name] = 'EEB125'

print(ta_to_course)
print(ta_to_course['Matt'])
{'Amber': 'GGR274', 'Martin': 'GGR274', 'Davia': 'GGR274', 'KP': 'GGR274', 'Ilan': 'GGR274', 'Matt': 'EEB125', 'Fiona': 'EEB125'}
EEB125
import pandas as pd

time_use_data_raw = pd.read_csv('gss_tu2016_main_file.csv')

time_use_dur = time_use_data_raw[["CASEID", "dur44"]]

time_use_dur.shape
/var/folders/0j/ybsv4ncn5w50v40vdh5jjlww0000gn/T/ipykernel_65730/1769220173.py:1: DeprecationWarning: 
Pyarrow will become a required dependency of pandas in the next major release of pandas (pandas 3.0),
(to allow more performant data types, such as the Arrow string type, and better interoperability with other libraries)
but was not found to be installed on your system.
If this would cause problems for you,
please provide us feedback at https://github.com/pandas-dev/pandas/issues/54466
        
  import pandas as pd
(17390, 2)

Task 1#

Take 5 random samples from the column dur44 with replacement, and calculate the mean of each random sample. Set the parameter n in .sample to the first element of sample_size and store the means in a list called simulated_means10.

Steps to create simulated means:#

  1. Create a list called sample_size with the values 10, 20.

  2. Create an empty list called simulated_means10

  3. Create a for loop

  4. within the for loop take a random sample with replacement from the dur44 column of time_use_dur of size n = sample_size[0] with replace = True and calculate the mean of this sample. Store this value in rsample

  5. append rsample to simulated_means10.

sample_size = [10, 20]

simulated_means10 = []

for _ in range(5):
    rsample = time_use_dur['dur44'].sample(n = sample_size[0], replace = True).mean()
    simulated_means10.append(rsample)

simulated_means10
[0.0, 0.0, 0.0, 0.0, 0.0]

Task 2 - Store the simulated means in a dictionary#

Create an empty dictionary called sim_dict. Add the key sample_size[0] with values simulated_means10.

sim_dict = {}

sim_dict[sample_size[0]] = simulated_means10

Task 3 - Repeat the steps to create simulated means for sample_size[1].#

Store the results in simpulated_means20.

simulated_means20 = []

for _ in range(5):
    rsample = time_use_dur['dur44'].sample(n = sample_size[1], replace = True).mean()
    simulated_means20.append(rsample)

simulated_means20
[11.0, 23.25, 13.5, 6.75, 4.5]

Task 4 - Store the simulated means in a dictionary#

Add sample_size[1] as a key with values simulated_means20 to sim_dict.

sim_dict[sample_size[1]] =  simulated_means20

Task 5 - Store the data in sim_dict in a pandas DataFrame called sim_df.#

sim_df = pd.DataFrame(sim_dict)

sim_df
10 20
0 0.0 11.00
1 0.0 23.25
2 0.0 13.50
3 0.0 6.75
4 0.0 4.50