Exploring Cloud Native Geospatial Data Formats: Hands-on with Rasters
2026-08-30 , 601

Dig into three cloud-native raster formats—COGs, Zarr, and Kerchunk—and learn how data access works under the hood with hands-on Python exercises, no image libraries required!


Ever wonder what GDAL is doing under the hood when you read a Cloud-optimized GeoTIFF (COG) off a remote server? Have you been wondering what this Zarr thing is all about and how it works? Then there's the whole Kerchunk/VirtualiZarr indexing to get cloud-native access for non-cloud-native data formats, what's that about?

Cloud-native geospatial is all the rage these days, and for good reason. As data sizes grow, layer counts increase, and analytical methods become more complex, the traditional download-to-the-desktop approach is often untenable. It's no surprise then that users are turning to cloud-native tools to scale out their analyses. But as we transition away from opening whole files to now grabbing ranges of bytes off remote servers it seems all the more important to understand exactly how cloud native data formats actually store data and what tools are doing to access it.

This workshop aims to dig into how cloud-native geospatial data formats are enabling new operational paradigms, with a focus on raster formats. We'll start by surveying the current cloud-native geospatial landscape to understand the importance of cloud native and how it is being used, including the core tenants of cloud-native, common formats, and how things like SpatioTemporal Asset Catalogs (STAC) and STAC-based tooling integrate to provide more efficient access paradigms.

Then we'll get hands-on to build up an understanding of how these formats work at a deep level. We'll extract a tile from a COG by hand, then try the same with Zarr data to see how those formats compare. Lastly, we'll look at Kerchunk/VirtualiZarr and see how these allow optimized data access for non-cloud-native formats.


Level of the workshop: 3 - advanced Pre-requirements for attendees:

We'll have a lot to cover in the workshop and time is against us. Please try to come with a working notebook execution environment already setup and ready to go. The workshop repository's README outlines three different options: build and run the docker container, use a GitHub Codespace, or run from a python venv managed via uv.

Due to the uncertain quality of conference internet, a local option (docker or using uv) is recommended, but Codespaces can be useful for those that cannot run either of those options.

What skills do participants require to have?:

This workshop expects some familiarity with geospatial programming in Python. Most of the notebook code is already provided, so any gaps in understanding don't necessarily prohibit completing the exercises. That said, a basic knowledge of STAC and Cloud-Native Geospatial Python tooling and working with rasters as single and multidimensional arrays is quite helpful.

A good primer workshop is Alex Leith of Auspatious's Cloud-Native Geospatial for Earth Observation Workshop. It is recommended to work through those activities or have an equivalent knowledge prior to working through the notebooks in this workshop.

Link to software source code:

https://github.com/jkeifer/cng-raster-formats

Jarrett Keifer is a Senior Geospatial Software Engineer at Element 84, a commercial geospatial consultancy that uses open-source to build effective customer solutions. His interests include education and outreach, geospatial data formats, and high-performance systems/network programming. He enjoys designing systems to operate at scale, particularly to support remote sensing data processing and earth science applications, and has over ten years of experience contributing to open source projects.

This speaker also appears in: