GGR274 Lab 9: Project proposal feedback, data pre-processing#
Logistics#
Like previous weeks, our lab grade will be based on attendance and submission of this notebook to MarkUs during the lab session (or by 23:59 on Thursday).
You do not have to answer every question, but your notebook should be submitted as usual for attendance. Submit your completed file to MarkUs. Here are the instructions for submitting to MarkUs (same as last week):
Download this file (
Lab_9.ipynb
) from JupyterHub. (See our JupyterHub Guide for detailed instructions.)Submit this file to MarkUs under the lab9 assignment. (See our MarkUs Guide for detailed instructions.)
Note: there’s no autograding set up for this week’s lab, but your TA will be checking that your submitted lab file is complete as part of your “lab attendance” grade.
Lab 9 Introduction#
This lab is intended to give you some structured work time for your project. Since your projects are all different, there are no prescribed tasks in this lab, instead, the questions in this lab are designed to encourage you to think deeper on which functions you will use to start your data preprocessing. Futhermore, your TA will give you feedback on your project proposals which you can note down here to refer back to later.
What you write in this and the following labs are for your own reference. You may answer the questions in either words or code, but whenever possible, you should give the function you intend to use. This will help you think about your project.
Let’s start!#
What are the strengths and weaknesses of your project proposal? Are there potential issues raised by your TA? Feel free to discuss with your fellow classmates.
# Write your notes here using comments
What functions will you use to import your data?
# Write your notes here using comments
Do you have any missing values? If so, how do you plan to analyze rows/columns with missing values?
# Write your notes here using comments
Which variables/columns will you use for your analysis? What function will you use to select those variables/columns?
# Write your notes here using comments
Do you need to filter your data for specific conditions? If so, what are those conditions?
# Write your notes here using comments
Do you plan to create new variable(s) based on existing data? How will you create the new variable(s)?
# Write your notes here using comments
Create a barplot of a column of your data to explore its shape (refer to documentation). Does your distribution look like a normal distribution? If not, how might this information impact your analysis?
# Write your notes here using comments