A Proposal for Hierarchically Organized GeoParquet
2026-09-02 , Conference Management Room1

GeoParquet is a file format based on Apache Parquet for efficient storage of large geospatial datasets. Well-sorted Parquet achieves better compression and better query performance. However, it is not enough for streaming features with acceptable performance. To address this, I added hierarchical dimension for sorting and achieved better results.


Abstract

GeoParquet is a file format based on Apache Parquet for efficient storage of large geospatial datasets. To make better use of Parquet, spatial sorting is key. Well-sorted Parquet achieves better compression and better query performance. However, it is not enough for streaming features with acceptable performance. To address this, I added hierarchical dimension for sorting and achieved better results. In this session, I propose spatially and hierarchically organized GeoParquet layout for streaming.

Outline

  1. GeoParquet
  2. Spatial sorting
  3. Predicate Pushdown
  4. Feature streaming is not easy
  5. Hierarchical sorted Parquet

Reference


Level of technical complexity: 2 - intermediate Indicate what is (are) the open source project(s) essential in your talk:

https://github.com/Kanahiro/yosegi

I make my conference contribution available under the CC BY 4.0 license. The conference contribution comprises the abstract, the text contribution for the conference proceedings, the presentation materials as well as the video recording and live transmission of the presentation: