GGR274 Lab 10: Project data description and analysis#
Logistics#
Lab grade will be based on submission of this notebook to MarkUs during the lab session (or by 23:59 on Thursday).
You do not have to answer every question, but your notebook should be submitted as usual for attendance. Submit your completed file to MarkUs. Here are the instructions for submitting to MarkUs (same as last week):
Download this file (
Lab_10.ipynb
) from JupyterHub. (See our JupyterHub Guide for detailed instructions.)Submit this file to MarkUs under the lab10 assignment. (See our MarkUs Guide for detailed instructions.)
Note: there’s no autograding set up for this week’s lab, but your TA will be checking that your submitted lab file is complete as part of your “lab attendance” grade.
Lab 10 Introduction#
This lab is intended to give you some structured work time for your project. Last week, you have outlined how you will prepare your data for analysis. This week will focus on getting you to think about how to start describing your data and performing some preliminary analysis.
What you write in this and the following labs are for your own reference. You may answer the questions in either words or code, but whenever possible, you should give the function you intend to use.
Let’s get started:#
You can change the following code to import your data
import pandas as pd
# Change the variable name, file name, sheet name, and the header before run it
Your_DataFrame_Name_1 = pd.read_excel("../../../project/presentation/data/Your_Intentional_Data_File_Name",
sheet_name = "Sheet_Name",
header = 999)
# Import another dataframe you have chosen
Your_DataFrame_Name_2 = pd.read_excel("../../../project/presentation/data/Your_Intentional_Data_File_Name",
sheet_name = "Sheet_Name",
header = 999)
# Think about how to merge the dataframes
# Or paste your codes last week here if you already prepared!
What function can you use to produce a statistical summary of your data?
# Write your notes here using comments
In your statistical summary, you will see values for min, max, standard deviation, etc. What analysis would you need to perform using this information to find out if there are outliers in your dataset (hint: think about quartiles)?
# Write your notes here using comments
Will outliers impact your analysis? If so, how will you treat outliers?
# Write your notes here using comments
What is your hypothesis about your research question?
# Write your hypotheses here!
# Include null hypothesis and alternative hypothesis
What functions will you need to use to test your hypotheses?
# Write your notes here using comments