Exercise: Run a cluster analysis

Exercise: Run a cluster analysis#

Goal: You will use the skills you’ve learned throughout the course to conduct two analyses where you compute the Moran’s I statistic. Additionally, you will explore how to run a sptial correlation between two variables.

Step 1 Load the Toronto_Neighbourhoods.geojson and toronto_health_data_2017.csv files into python.

Step 2 Merge the health data with the spatial data

Step 3 Generate maps of two of the three health variables included in the dataset (e.g., pick diabetes and mental health, or mental health and asthma). Make sure to classify the data and make it look nice.

For each of the two columns do the following to pratice spatial autocorrelation:

Step 4 Calculate a weights matrix using libpysal (referred to as lps below). Hint The function is lps.weights.Queen.from_dataframe.

Step 5 Use the esda.Moran() function to calculate the moran’s I statistic and plot it.

Step 6 Use the esda.Moran_Local() function to calculate the LISA statistic and plot it.

Step 7 Interpret the global and local Moran’s statistics in a few sentences. What do they tell us about the spatial distribution of the two health outcomes you selected?

Do the following steps to explore spatial correlation

Step 8 Two options to conduct sptial correlation:

  1. Use a linear regression between the spatial lags of one health outcomes and the spatial lags of the other one.

  2. Calcuating Lee’s L statistics (similar with Moran’s I, but apply to two or more variables). See wiki and esda.Spatial_pearson for how to do it.

Step 9 Interpret the spatial correlation results. Do they tell us whether the spatial distribution of the two health outcomes are similar?

import numpy as np
import pandas as pd
import geopandas as gpd
import matplotlib.pyplot as plt
import mapclassify
import esda
import splot
import libpysal as lps
import contextily as cx
from splot.esda import moran_scatterplot
from splot.esda import lisa_cluster
from statsmodels.formula.api import ols
#start your code here - add cells as needed