EEB125 Homework 3: Reading and manipulating some data#
Logistics#
Due date: The homework is due 11:59pm on Tuesday, January 28th.
You will submit your work on MarkUs. To submit your work:
Download this file (
Homework_3.ipynb) from JupyterHub. (See our JupyterHub Guide for detailed instructions.)Submit this file to MarkUs under the hw3 assignment. (See our MarkUs Guide for detailed instructions.)
Overview#
This week, you will be building on and synthesizing some of the tools that we learned about in lecture this week to answer a data science question: Which genus of mammals has the longest forearms relative to their body size? We will study this by using a dataset of forearm lengths and total body length (expressed as head-body length), both measured in millimeters, measured at the level of individual species.
Problem 1: Read in the data file#
Problem 1a.#
Open the data file mammal_measurements.csv. Assign the result to a variable called file. Next, read in the lines and assign the output to a variable called lines. Examine the header (the first line of the data).
# Write your code here
Problem 1b. Interpret the data file#
Please explain what data is contained in this file by examining the contents of the first line (the “header”). (1pt)
WRITE YOUR RESPONSE HERE.
Problem 2: Iterate through the lines and calculate a metric for relative forelimb length#
The measurements in this data file are forelimb length and body length. How long an animal’s forelimbs are relative to their body can often tell us something about how they move around.
Create an empty dictionary and assign it to a variable named sp_forearms. Loop over the data variable using a for loop and populate the dictionary with one entry for each species. The keys should be the species names in each line. The values should be a metric for relative forearm length. We will need to calculate this ourselves from each of the two measurements for each species.
Use the following formula:
relative_forearm_len = adultForearmLen_mm / adultHeadBodyLen_mm
If we had a dictionary containing only an entry for humans, Homo sapiens, it might look something like this:
{"Homo sapiens" : 0.6}
Hint: currently the lines of data are strings. You’ll need to split up the lines (by commas) and then convert the numerical entries into floats by calling float() on them.
# Write your code here
Problem 3: Create a dictionary containing all of the measurements for each Genus#
The species names contained within this datafile are specified using Linnean binomial nomenclature. This means that the first word gives the genus name, while the second gives the species name. For example, the scientific name for humans is “Homo sapiens”. So our genus is “Homo”, and species is “sapiens”. Often, species within the same genus are quite similar. To make our numbers a bit easier to parse through, let’s find the maximum relative forearm length found in each genus. You will want to pick the first word from each species name, and then associate it with a list containing the measurements for each genus
Hint: You will want to loop through the dictionary we have created to make genus records. You can make use of the max() function contained within Python to identify the largest measurement within each genus.
Problem 3a: Initialize a dictionary for our genera#
Create an empty dictionary and assign it to the variable genus_measurements. Loop through the lines of our data file and add the genus from each line to the dictionary as a key, with an empty list [] as the associated value.
Question for you to consider (but no need to answer here): The whole reason we are moving up to the level of genera is that our data file often contains multiple species per genus. If we add the genus from each line to the dictionary as a key, will we have duplicate entries in our dictionary? Why or why not?
# Write your code here
Problem 3b: Populate the dictionary with the relative forearm lengths for each genus#
Here, loop over the dictionary of relative forearm lengths that you assigned to the variable sp_forearms. Then, extract the genus from each of these and then add the measurements from each line to the dictionary we created in step 3a (genus_measurements). Remember that in the dictionary that we created in step 3a (genus_measurements), we have as keys the genus, with empty lists as the values. We will want to append the measurements for each genus to the empty lists stored as values.
Warning: because your code for this problem modifies genus_measurements, if you want to re-run this cell, you should first re-run your Problem 3a cell to re-initialize genus_measurements. Otherwise you’ll see “strange” behaviour where the code in this cell keeps adding new elements to the lists in genus_measurements every time you run the cell.
# Write your code here
Problem 3c: Find the largest relative forearm length for each genus in the dictionary#
Create an an empty dictionary and assign it to the variable biggest_genus_forearms. Last, please loop over genus_measurements and find the largest measurement for each genus from the list of measurements that we assigned to each genus in the previous step. Keep in mind that Python has the built-in function max(), which will identify the largest value in a list. Add each genus to biggest_genus_forearms as the key with the corresponding value as the biggest forearm length.
# Write your code here
Problem 4: Interpret your results#
Look through our results. Find the genera with the largest relative forearm lengths. These should be values close to 1.
Answer the following questions, in your own words:
Describe what it means to have a relative forearm length close to 1, given how we calculated the metric. (1pt)
Do you notice anything in common about the genera with the longest forearms? (2pts) Speculate about why these organisms might have such long forearms. Do they share anything in common with regard to their locomotion (how they move around)? (2pts)
WRITE YOUR RESPONSE HERE.