Bridging the Gap: Practical Spatial Partitioning of GeoParquet in an Evolving Ecosystem
2026-09-02 , Conference Management Room5

Released 1.5 years ago, GeoParquet 1.1's features had evolving ecosystem support in late 2025. We built a custom spatial indexing pipeline using Dagster and GeoPandas while waiting for native tools to fully mature.


Ecosystem Gaps
In late 2025, while GeoParquet 1.1 had been available for over a year, practical library support for its spatial partitioning and metadata was still catching up. As a data engineer managing large point datasets on S3, we needed a way to leverage these features while the community's toolset was in development.

Architecture: Decoupling Metadata and Data
To avoid inefficient full-table scans and redundant downloads, we implemented a practical indexing strategy:
* Manual Indexing: We built a secondary GeoParquet "Catalog" containing polygon boundaries and partition keys to act as a spatial index.
* Orchestration: Using Dagster, we managed the dependency between this catalog and the primary data processing, ensuring consistent partitioning.
* Efficient Filtering: Using GeoPandas, we queried the catalog first, fetching only the necessary data fragments from S3.

Reflections on a Shifting Landscape
Since the project’s inception, the ecosystem has moved quickly. Tools like DuckLake and various query engines are increasingly adding native support for these spatial operations, potentially turning our "manual bridges" into legacy code.

This talk reflects on the engineering mindset required during the gap between a new standard and its adoption. Sometimes we build our own wheels, and sometimes we are happy to see them replaced by the community.

Key Takeaways
1. Designing custom spatial indexing with GeoParquet
2. Practical patterns for Dagster and GeoPandas in geospatial data pipelines


Level of technical complexity: 2 - intermediate Indicate what is (are) the open source project(s) essential in your talk:

GeoParquet, GeoPandas, Dagster, DuckDB, DuckLake

I make my conference contribution available under the CC BY 4.0 license. The conference contribution comprises the abstract, the text contribution for the conference proceedings, the presentation materials as well as the video recording and live transmission of the presentation: