11-05, 15:30–16:00 (America/New_York), Regency Ballroom B
The Center for Geospatial Solutions rewrote their formerly proprietary-based tool that identifies informal settlements and predicts where future building may occur. Using open source packages in Python, they decreased run time and increased usability across user skill levels.
The future of urban change belongs to everyone. Using free data available world-wide and writing a free and open-source tool, the Center for Geospatial Solutions at the Lincoln Institute of Land Policy (CGS) is opening an avenue to fill a gap in global urban planning research.
Our tool presents a novel methodology that leverages readily available open datasets to create an objective, reproducible, and predictive understanding of urban development patterns. Our approach combines building footprints, transportation networks, and terrain data from OpenStreetMap and Overture Maps with innovative analytical techniques adapted from ecological science. By treating buildings as species and urban areas as habitats, we apply presence-only observation modeling techniques (MAXENT) to understand and predict urban development patterns.
The resulting analysis allows cities to better anticipate and prepare for future growth patterns and can be regularly updated as new data becomes available. The free and open-source nature of the data, software used, and the tool’s codebase ensures consistency and accessibility across different urban contexts while maintaining methodological rigor.
Proprietary software is expensive and not accessible to planners, community members, and government workers in locales that cannot afford to pay the license fee. Planners and other folks who rely on analyses based on public urban data are lacking an abundance of free open-source alternatives that build on the basics that software like ArcGIS and QGIS easily introduce.
CGS is familiar with this problem: after presenting the ArcPy notebook at the World Urban Forum in Cairo in 2024, focus groups noted the potential barrier of use in countries where formal planning is not abundant. In cities and countries where formal planning systems are not part of the government, local or otherwise, both data and tools can be hard to find. The entire pipeline from development to user needs to be free and open-source to create new secondary datasets for research-supported policies.
Esri products are generally well documented but hide key parameters in their tools that run complicated processes, choosing to make it easier for the user by estimating the best parameter based on the data. Translating from ArcPy to open-source Python packages required CGS to understand the analyses we were running more completely. The learning curve can be steep and while many examples and tutorials exist, they are not always suited to the user’s needs. Given that our intended audience for the tool is people who can use a desktop GIS software but are not experts in GIS, we set defaults for parameters that are challenging to understand, for example the bandwidth for a Kernel Density Estimate in sklearn. Because our code is open source, our parameters are documented and public, and more advanced users can modify the code easily and suggest changes that more adequately meet their needs.
A second challenge was that not every Esri tool has a one-to-one match with an open-source alternative. Some processes take multiple packages or need to use a different solution to a similar problem. For example, to solve a simple routing problem of many points to one destination, the options through Esri are OD Matrix and Closest Facility. Both Esri options require several layers of set up to use and while OD Matrix is optimized for speed, it's not quick when the number of origins and/or destinations are large. However, using OSMNx, MOMEPY, and NetworkX to download and create the network dataset, select the node closest to the city center, and run an optimized shortest path length algorithm takes a fraction of the time that using Network Analyst does. The function we used from NetworkX is optimized for a 1:M problem, unlike the more generic N:M solver that Esri’s OD Matrix provides. Even though we needed to use multiple Python open-source packages to replace Esri’s Network Analyst, the resulting code maximizes speed and code readability, improving the user experience at multiple skill levels.
Another benefit to using open-source software is that we can use data storage systems that are optimized for speed and compression. Proprietary products like ArcGIS rely on datatypes that do not always play well with open source software. Esri products also function inefficiently with some newer file formats that open source software packages can read natively. Esri products require a multi-file connection to use the column-oriented, compressed, and now natively geospatial, parquet file type. Parquet is our preferred vector file type due to its querying speed, compression, partitioning options, and the ease and speed of I/O with open-source tools, like Geopandas and DuckDB. Our tool uses parquet by default but can write outputs in shapefile and geopackage formats, allowing the user to choose the output that they are most comfortable using and enhancing accessibility.
Margo Atkinson has been with the Center for Geospatial Solutions at the Lincoln Institute of Land Policy since 2023, currently in the role of Manager, Research & Analysis. She specializes in taking spatial processes to more reproducible, comprehensive, and growth-ready forms. As a project incubator, Margo has worked with policy and data specialists to increase the analytical rigor of spatial dataset creation and spatially informed project scoring systems. With a particular focus on the social and physical infrastructure that shape our communities, Margo’s core motivation is to increase social equity in policy and metrics through data tool access and new ways of understanding the world we live in.