-
Notifications
You must be signed in to change notification settings - Fork 24
Using `maup` to import a real‐life plan in GerryChain
To generate an ensemble of districting plans using GerryChain, we need a
starting point for our Markov chain. GerryChain gives you functions like
recursive_tree_part
to generate such plans from scratch, but you may want to
use an actual districting plan from the real world instead. You also may want to
compare a real-life plan to your ensemble.
maup
is MGGG's package for doing common
spatial data operations with redistricting data. We'll use this package to
assign our basic geographic units to the districts in the plan we're interested
in. Our steps are:
- Download shapefiles of the districting plan we're interested in and the basic units that we want to use,
- Use
maup
to assign the basic units to the districts in that plan, and then - Create a GerryChain Partition using that assignment.
>>> import maup
>>> import geopandas
>>> import matplotlib.pyplot as plt
For this guide, we'll assign Massachusetts's 2020 block groups to the 2022 Massachusetts State Senate districting plan. We can use geopandas to download the shapefiles that we need straight from the TIGER/Line files on the U.S. Census Bureau's website and load them into Python as GeoDataFrames:
>>> districts = geopandas.read_file("https://www2.census.gov/geo/tiger/TIGER2022/SLDU/tl_2022_25_sldu.zip")
>>> units = geopandas.read_file("https://www2.census.gov/geo/tiger/TIGER2022/BG/tl_2022_25_bg.zip")
Side note: These shapefiles and the GeoDataFrames that we have created from them contain only basic information
about the geographies of the underlying units. It is often important to incorporate additional information about
population, election data, etc. See the main maup
documentation page for examples of how to aggregate/disaggregate such data between different units to get all the data that you need into a single GeoDataFrame.
It is best to do this before building the GerryChain Graph object in the Creating a Partition
with the real-life assignment section below.
Now we can use maup
to assign our units to districts.
>>> assignment = maup.assign(units, districts)
Note that this produces a warning message something like the following:
---------------------------------------------------------------------------
Geometry is in a geographic CRS. Results from 'area' are likely incorrect. Use 'GeoSeries.to_crs()' to re-project geometries to a projected CRS before this operation.
df = df[df.area > area_cutoff].reset_index(drop=True)
UserWarning: Geometry is in a geographic CRS. Results from 'area' are likely incorrect. Use 'GeoSeries.to_crs()' to re-project geometries to a projected CRS before this operation.
geometries = geometries[geometries.area > area_cutoff]
Geometry is in a geographic CRS. Results from 'area' are likely incorrect. Use 'GeoSeries.to_crs()' to re-project geometries to a projected CRS before this operation.
return assign_to_max(intersections(sources, targets, area_cutoff=0).area)
This is because shapefiles from the Census Bureau all use a "geographic" coordinate projection, using latitude/longitude coordinates to represent the locations of points. These coordinates are too coarse for many geometric operations (e.g., accurate computations of lengths and areas), so it is recommended (but not required!) to convert to a "non-geographic" coordinate projection better suited for such computations. One option is to reproject to a suitable UTM (Universal Transverse Mercator) projection via
>>> districts.to_crs(districts.estimate_utm_crs(), inplace=True)
>>> units.to_crs(units.estimate_utm_crs()), inplace=True)
Now let's repeat the maup.assign
command:
>>> assignment = maup.assign(units, districts)
And now it runs without producing the warning message! 🎉
We'll save the assignment in a column of our units
GeoDataFrame:
>>> units["SENDIST"] = assignment
Let's use the pandas .isna()
method to see if we
have any units that could not be assigned to districts:
>>> assignment.isna().sum()
0
This means that every unit was successfully assigned. If our basic units were too large to get a meaningful assignment, or if the districts did not cover all of our units (e.g. if our units included parts of the Atlantic Ocean but the districts did not), then we would have units with NA assignments that we would need to make decisions about.
See the main maup
documentation page for another example and more information, including how to handle cases where smaller units do not nest cleanly into larger units such as districts.
Now we are ready to use this assignment in GerryChain. We'll start by building the GerryChain Graph object that records adjacency information between our basic units
>>> from gerrychain import Graph, Partition
>>> graph = Graph.from_geodataframe(units)
For larger GeoDataFrames, building this Graph object can take awhile! Once you have built it, you may want to save it as a .json file:
>>> graph.to_json("./BG_graph.json")
Then you can read it in again later using:
>>> graph = Graph.from_json("./BG_graph.json")
Each node of this graph represents one basic unit, and the node contains all the information about that unit that was contained in the GeoDataFrame for the units. For example, here is the first node in the graph:
>>> graph.nodes[0]
{'boundary_node': False,
'area': 1054008.1312115968,
'STATEFP': '25',
'COUNTYFP': '027',
'TRACTCE': '728200',
'BLKGRPCE': '3',
'GEOID': '250277282003',
'NAMELSAD': 'Block Group 3',
'MTFCC': 'G5030',
'FUNCSTAT': 'S',
'ALAND': 1053449,
'AWATER': 0,
'INTPTLAT': '+42.3265093',
'INTPTLON': '-071.8222284',
'SENDIST': 33}
Side note: If we happened to have a preexisting unit graph and we needed to add the district assignments
to the nodes of the graph, we would need to match the nodes of
our graph to the geometries in units
by their GEOIDs. We could use the Graph
object's .join()
method to do this matching automatically:
>>> graph.join(units, columns=["SENDIST"], left_index="GEOID", right_index="GEOID")
The left_index
and right_index
arguments tell the .join()
method to use
the GEOID
node attribute and the GEOID
column of units
to match the
records in units
to the nodes of our graph.
Now we use the Graph node attribute called SENDIST
to create a
Partition:
>>> real_life_plan = Partition(graph, "SENDIST")
To check our work, let's plot the real_life_plan
on the units using the Partition's
.plot()
method:
>>> real_life_plan.plot(units, figsize=(10, 10), cmap="tab20")
>>> plt.axis('off')
>>> plt.show()
That looks like a districting plan! Woohoo! 🎉🎉🎉
If the plan looked like random noise or confetti, then we might suspect that something had gone wrong. The two places we would want to look for problems would be:
- the
graph.join
call, which would go wrong if the GEOIDs did not match up correctly, or - the final
real_life_plan.plot
call, which would go wrong if the GeoDataFrame's index did not match the node IDs of our graph in the right way.
We could inspect both issues by making sure that the records with matching IDs actually referred to the same block groups.
You also might run into problems when you go to run a Markov chain using the partition we made. If the districts are not contiguous with respect to your underlying graph, you would need to add edges (within reason) to make the graph agree with the notion of contiguity that the real-life plan uses. See What to do about islands and connectivity for a guide to handling those types of issues.