2026-09-02 –, Conference Management Room1
GeoParquet is a file format based on Apache Parquet for efficient storage of large geospatial datasets. Well-sorted Parquet achieves better compression and better query performance. However, it is not enough for streaming features with acceptable performance. To address this, I added hierarchical dimension for sorting and achieved better results.
Abstract
GeoParquet is a file format based on Apache Parquet for efficient storage of large geospatial datasets. To make better use of Parquet, spatial sorting is key. Well-sorted Parquet achieves better compression and better query performance. However, it is not enough for streaming features with acceptable performance. To address this, I added hierarchical dimension for sorting and achieved better results. In this session, I propose spatially and hierarchically organized GeoParquet layout for streaming.
Outline
- GeoParquet
- Spatial sorting
- Predicate Pushdown
- Feature streaming is not easy
- Hierarchical sorted Parquet