EEB125 Homework 4: Working with Booleans and Functions#

Logistics#

Due date: The homework is due 11:59pm on Tuesday, February 4th.

You will submit your work on MarkUs. To submit your work:

  1. Download this file (Homework_4.ipynb) from JupyterHub. (See our JupyterHub Guide for detailed instructions.)

  2. Submit this file to MarkUs under the hw4 assignment. (See our MarkUs Guide for detailed instructions.) All homeworks will take place in a Jupyter notebook (like this one). When you are done, you will download this notebook and submit it to MarkUs. We’ve incuded submission instructions at the end of this notebook.

Overview#

This week, you will be practicing a couple of programming techniques we examined in lecture to answer data science questions about sex-based differences in COVID infection and mortality across the United States.

We will be using data from https://www.genderscilab.org/gender-and-sex-in-covid19 for the week of 11/1/2021, which is available in covid_sex.csv. To inform your exploration, please read the article “What’s Really Behind the Gender Gap in Covid-19 Deaths?”, printed in the New York Times, available in nytimes_covid_sex.pdf (go to File -> Open… to open this PDF). We will be exploring this dataset to interpret whether 1) we can observe sex-based differences in the risk of death among those infected with COVID and 2) whether we can identify any sociopolitical, as opposed to biological, explanations for any observed differences.

Task 1: Read in the data file#

Problem 1a. Prep our data#

Open the file covid_sex.csv in Python and read in the lines. Assign the header (the first line) to the variable header and the rest of the data to the variable data.

# Write your code here

Problem 1b. Interpret the data file#

Examine the header by printing it to the screen. We will be interested in the following data columns: State,Male_cases,Female_cases,Male_deaths, and Female_deaths. Please indicate which indices of the header each of these data columns corresponds to. (1pt) For example, State is at index zero. Please start your indexing at zero.

HINT: You might find it easier to read and count the columns indicated in the header if you split it up according to commas and interpret the resulting list.

# Write your code here

WRITE YOUR RESPONSE HERE.

Problem 1c. Examine the data#

Loop over data and print each line to the terminal. You may notice that some of the lines include multiple commas beside one another, with no text in between. This is one way of representing missing data in a .csv file. We will need to find some way to deal with this during the subsequent problems.

# Write your code here

Problem 2: Examining sex-based differences in COVID death risk#

Problem 2a.#

Create two empty lists and assign them, respectively, to the variables risk_m and risk_f.

# Write your code here

Problem 2b. Calculate risk of death given COVID infection for each sex#

Loop through the lines of our datafile. Create a metric for risk of death by COVID by dividing the number of deaths for each sex for each state by the number of cases for each state. In other words, use the following formulae:

covid_risk_m = deaths_m / infections_m

for males, and

covid_risk_f = deaths_f / infections_f

for females.

Append the results for each sex to each of the lists created in the previous step.

This step will require that we convert the values from strings to floating point numbers. However, some of the columns have missing values, which will be interpreted as Python as an empty string(“”). Use exception handling to skip over lines where the type conversion fails.

# Write your code here

Problem 3. Examining sex-specific differences in COVID risk#

Problem 3a: Create a function to calculate the statistical mean from a list of floating point numbers#

In this assignment, we will want to calculate the average number of COVID deaths and infections across states for both males and females. Remember from lecture that a statistical mean of a list is calculated as the sum of all of the elements in a list divided by the length of the list (reference this week’s lecture if you have forgotten). Please finish the function below so that it takes a list of values and outputs their mean.

def calc_mean(values):
    # Replace the ... with your code.
    # Your last line of code should be of the form return <value>,
    # where <value> is the computed mean of your data.
    ...
# This cell is provided to you to help check your work
calc_mean(risk_m)

Problem 3b. Calculate mean differences across the sexes in risk of death given COVID infection#

Estimate mean risk of death from COVID for each sex using the function created in the previous step. Assign the results for each sex to the variables mean_risk_m and mean_risk_f

# Replace the ... with the appropriate code
mean_risk_m = ...
mean_risk_f = ...
# This cell is provided to you to help check your work
print(mean_risk_m,mean_risk_f)

Problem 3c. Interpret your results#

Which sex appears to be at greater risk of death, on average, if they are infected with COVID (2pt)? Do you think any difference might stem from biological, or behavioural differences, and why (2pt)? Feel free to speculate.

WRITE YOUR ANSWER HERE.

Problem 4: Politics and epidemiological risk#

In this section, we will combine data on the political affilation of each state’s governor with the COVID data to ask the question: do Democrat-run or Republican-run states tend to have greater sexual disparity in risk of COVID death?

Problem 4a. Read in the second dataset#

Open the file state_governors.csv. This contains information on the political party affiliation of each state’s governor. Read in the lines and assign the header to the variable gov_header and the rest of the dataset to the variable gov_data.

# Write your code here

Problem 4b. Examine the dataset#

Print gov_header. Examine the columns and the first few lines of the data. Explain what information you think each column contains. (1pt)

# Write your code here

WRITE YOUR RESPONSE HERE

Problem 4c. Read in the political party data#

Create an empty dictionary and assign it to the variable state_govs. Loop over the lines of gov_data and store the name of each state as a key in the dictionary, with the political party of its governor as the value.

# Write your code here

Problem 4d. Sexual disparity in risk of COVID death by political party#

Create two empty lists and assign them, respectively, to the variables democrat_disp and repub_disp.

Then, loop over the original COVID data, similar to problem 2b, but this time computing the disparity between male and female risk of death and adding the value to either the democratic or republican list based on the state governor. Referencing the formula given in question 2b, please estimate the sex-based disparity in risk of death given COVID infection using the following formula:

risk_disp = covid_risk_m - covid_risk_f

For each row of COVID data, you will want to determine the state and look up the political affiliation of the governor of the state in the state_govs dictionary. Then append the risk disparity value to either democrat_disp or republican_disp, depending on whether the state in the current line has a democrat or republican governor. You will want to use an if statement to accomplish this.

HINT: when comparing strings, use .lower() or .upper() to make sure the cases match. You may also wish to check for and remove any extraneous whitespace.

# Write your code here

Problem 4e. Average sex-based disparity for states run by each political party#

Please use the calc_mean function you defined above to calculate the average disparity in COVID death risk across states controlled by each major political party in the US. Assign the means from repub_disp and democrat_disp to the variables repub_mean_risk and democrat_mean_risk, respectively.

# Replace the ... with the appropriate code
repub_mean_risk = ...
democrat_mean_risk = ...
# This cell is provided to you to help check your work
print(repub_mean_risk,democrat_mean_risk)

Problem 4f. Interpret your results#

  1. Please interpret the disparity metric from problem 4d. What does a value above zero indicate? What would a value below zero indicate? (4 pts)

  2. Do Democrat-run or Republican-run states display, on average, greater disparity between risk of COVID death for males vs for females (2pts)?

  3. Do you think the observed difference is significant or meaningful? Explain why or why not (2pts). In future weeks, we will learn how to evaluate this latter question statistically, but for now, see if you can create your own argument. Speculate on any possible reasons for the observed difference. Does our analysis of COVID death rates by dominant political party provide any possible sociopolitical causes for disparity between the sexes? (4pts)

WRITE YOUR RESPONSES HERE.