FOSS4G 2023

Creating Global Edge-Matched Subnational Boundaries
06-28, 16:00–16:30 (Europe/Tirane), UBT C / N111 - Second Floor is a personal initiative originally created to develop offline interactive reference maps for humanitarian actors. However, in short time, it transitioned to helping develop common operational datasets that form the foundation for humanitarian response planning. Over the past 2 years, enormous effort has gone into releasing a high-resolution composite dataset able to be updated daily from multiple sources. This talk will cover 3 aspects of the project.


Edge-matching resolves gaps and overlaps between hundreds of separate national data sources, requiring an algorithm that can perform at global scale. The resulting methodology uses something akin to a euclidean allocation raster applied to vector space, free of the compromises other approaches like generalization and snapping make. If you've ever been challenged by topology or data cleaning, you might find some insights into solving your own problems with the ideas contained here.


The edge-matching algorithm involves multiple complex and computationally intensive steps. Although Geopandas and GDAL usually come to mind when building multi-step geoprocessing scripts, PostGIS ended up being the fastest and best scaling tool for transforming gigabytes of vector data. I'll challenge your assumptions of how it can be used to create pipelines on both desktops and in the cloud, and make a case for why you should include it in your next project.


A composite dataset is only as good as the foundations it builds upon, and great care was taken in selecting which sources were used in this project. For international boundaries, I'll go into detail about how I used only public domain sources to create an ISO 3166 compliant dataset. At the subnational level, I'll highlight two projects that each curate updated administrative boundaries: one by the United Nations, another by an academic institution.

Whether you're a remote sensing specialist in search of the best topologically valid boundaries to run zonal statistics with, a Python developer frustrated by your pipelines constantly running into memory limits, or just want to run this tool on your own boundaries, I hope you come away from this talk with a valuable concept you can apply to your own work.



Max has been working as a geospatial software engineer in the "4Good" space for the past 8 years. Spending years at IMPACT Initiatives doing field work across Africa and the Middle East responding to famine and armed conflict, he later transitioned to consulting for multiple UN agencies building web platforms using machine learning and big data to solve issues related to food security and sustainable development goals. Now at Space4Good in the Hague, he works with a highly motivated team on nature based solutions addressing global challenges such as climate change and biodiversity.