11-18, 09:00–12:00 (Pacific/Auckland), WF511
Dig into geospatial vector formats—including GeoJSON, WKT/WKB, and cloud-native GeoParquet—using Python to see in detail how vector features are stored in each format and to understand what cloud-native means for vector data.
Cloud-native geospatial is all the rage these days, and for good reason. As file sizes grow, layer counts increase, and analytical methods become more complex, the traditional download-to-the-desktop approach is quickly becoming untenable for many applications. It's no surprise then that users are turning to cloud-based tools to scale out their analyses, or that traditional tooling is adopting new ways of finding and accessing data from cloud-based sources. But as we transition away from opening whole files to now grabbing ranges of bytes off remote servers it seems all the more important to understand exactly how cloud-native data formats actually store data and what tools are doing to access it.
This workshop aims to dig into how cloud-native geospatial data formats are enabling new operational paradigms, with a particular focus on vector data formats. Unlike its raster workshop counterpart, this workshop will be a bit more experimental. Vector data formats tend towards greater complexity than raster formats, so exactly how deep we get into which topics will be dependent on the audience’s interests and the time available. Broad themes to explore might include:
- GeoJSON: what is it, what does it represent, and how it is not cloud-native
- Well-Known Text/Binary (WKT/WKB): how these vector formats work and why they are important in GeoParquet
- GeoParquet: how does parquet store data, how geo maps into that paradigm, and what it takes to read some subset of data from a parquet file
- FlatGeoBuff: what is is, how it works, why it might be “more” cloud-native than GeoParquet
- Practical considerations when using these formats
The content of this workshop aims to not only be theoretical: a strong goal is to be as hands-on with these formats as possible by working with them in Python without any specific geospatial format libraries. We’ll look at interacting with object storage directly, to pull down files and fragments and inspect them, to build up working understanding of what common higher-level tooling does under the hood and abstracts away from users.
Prerequisites
This workshop expects some familiarity with geospatial programming in Python and a basic understanding of the vector data model and its utility. Most of the notebook code is already provided, so any gaps in understanding don't necessarily prohibit completing the exercises. That said, some knowledge of the geospatial vector formats and tooling is quite helpful.
Jarrett Keifer is a Senior Geospatial Software Engineer at Element 84, a commercial geospatial consultancy that uses open-source to build effective customer solutions. His interests include education and outreach, geospatial data formats, and high-performance systems/network programming. He enjoys designing systems to operate at scale, particularly to support remote sensing data processing and earth science applications, and has over ten years of experience contributing to open source projects.