Exploring Cloud Native Geospatial Data Formats: Hands-on with Vectors
2026-08-31 , 601

Dig into geospatial vector formats—including GeoJSON, WKT/WKB, and cloud-native GeoParquet—using Python to see in detail how vector features are stored in each format and to understand what cloud-native means for vector data.


Cloud-native geospatial is all the rage these days, and for good reason. As data sizes grow, layer counts increase, and analytical methods become more complex, the traditional download-to-the-desktop approach is often becoming untenable. It's no surprise then that users are turning to cloud-based tools to scale their analyses. But as we transition away from opening whole files to now grabbing ranges of bytes off remote servers it seems all the more important to understand exactly how cloud-native data formats actually store data and what tools are doing to access it.

This workshop aims to dig into how cloud-native geospatial data formats are enabling new operational paradigms, with a particular focus on (Geo)Parquet. Participants do not need an existing familiarity: we'll work together to develop a understanding of the concepts behind Parquet, starting with GeoJSON, roughly as follows:

  • GeoJSON: what is it, what does it represent, and how it is not cloud-native
  • Well-Known Text/Binary (WKT/WKB): how these vector formats work and why they are important in (Geo)Parquet
  • (Geo)Parquet: how does parquet store data, how geo maps into that paradigm, and what it takes to read some subset of data from a parquet table

The content of this workshop aims to be not only theoretical but practical: a strong goal is to be as hands-on with these formats in Python. We'll eschew common tools, opting to take a more manual approach. An educationally-focused Parquet library will provide a view into the process of reading Parquet files, their metadata, and techniques used to performantly run queries. Throughout, we'll be building up working understanding of what common higher-level tooling does under the hood and abstracts away from users.


Level of the workshop: 3 - advanced Pre-requirements for attendees:

Attending the author's raster formats workshop is an encouraged but optional prerequisite to this workshop.

We'll have a lot to cover in the workshop and time is against us. Please try to come with a working notebook execution environment already setup and ready to go. The workshop repository's README outlines three different options: build and run the docker container, use a GitHub Codespace, or run from a python venv managed via uv.

Due to the uncertain quality of conference internet, a local option (docker or using uv) is recommended, but Codespaces can be useful for those that cannot run either of those options.

What skills do participants require to have?:

This workshop expects some familiarity with geospatial programming in Python and a basic understanding of the vector data model and its utility. Most of the notebook code is already provided, so any gaps in understanding don't necessarily prohibit completing the exercises. That said, some knowledge of the geospatial vector formats and tooling is quite helpful.

Link to software source code:

https://github.com/jkeifer/cng-vector-formats

Jarrett Keifer is a Senior Geospatial Software Engineer at Element 84, a commercial geospatial consultancy that uses open-source to build effective customer solutions. His interests include education and outreach, geospatial data formats, and high-performance systems/network programming. He enjoys designing systems to operate at scale, particularly to support remote sensing data processing and earth science applications, and has over ten years of experience contributing to open source projects.

This speaker also appears in: