Exploring Cloud-Native Geospatial Formats: Hands-on with Raster Data Workshop FOSS4G 2025

Exploring Cloud-Native Geospatial Formats: Hands-on with Raster Data Workshop
.ical

11-17, 13:30–16:30 (Pacific/Auckland), WF603

Dig into three cloud-native raster formats—COGs, Zarr, and Kerchunk—and learn how data access works under the hood with hands-on Python exercises, no image libraries required!

Ever wonder what GDAL is doing under the hood when you read a GeoTIFF file? Doubly so when the file is a Cloud-optimized GeoTIFF (COG) on a remote server somewhere? Have you been wondering what this new Zarr thing is all about and how it actually works? Then there's the whole Kerchunk/VirtualiZarr indexing to get cloud-native access for non-cloud-native data formats, what's that about?

Cloud-native geospatial is all the rage these days, and for good reason. As file sizes grow, layer counts increase, and analytical methods become more complex, the traditional download-to-the-desktop approach is quickly becoming untenable for many applications. It's no surprise then that users are turning to cloud-based tools such as Dask to scale out their analyses, or that traditional tooling is adopting new ways of finding and accessing data from cloud-based sources. But as we transition away from opening whole files to now grabbing ranges of bytes off remote servers it seems all the more important to understand exactly how cloud native data formats actually store data and what tools are doing to access it.

This workshop aims to dig into how cloud-native geospatial data formats are enabling new operational paradigms, with a particular focus on raster data formats. We'll start on the surface by surveying the current cloud-native geospatial landscape to gain an understanding of why cloud native is important and how it is being used, including:

the core tenets of cloud-native geospatial data formats
cloud-native data formats for both raster and non-raster geospatial data
the intersection with SpatioTemporal Asset Catalogs (STAC) and how higher-level STAC-based tooling can leverage cloud-native formats for efficient raster data access processing of cloud-native data

Then we'll get hands-on and go deep to build up an in-depth understanding of how cloud native raster formats work. We'll examine the COG format and read a COG from a cloud source by hand using just Python, progressively grabbing data from the image until we can extract a target tile, all without using any image libraries. We'll repeat the same exercise for geospatial data in Zarr format to see how that compares to our experience with COGs. Lastly we'll turn our attention to Kerchunk/VirtualiZarr to see how these technologies might allow us to better optimize data access with non-cloud-native formats.

Prerequisites

This workshop expects some familiarity with geospatial programming in Python. Most of the notebook code is already provided, so any gaps in understanding don't necessarily prohibit completing the exercises. That said, a basic knowledge of STAC and Cloud-Native Geospatial Python tooling and working with rasters as single and multidimensional arrays is quite helpful.

A good primer workshop is Alex Leith of Auspatious's Cloud-Native Geospatial for Earth Observation Workshop. It is recommended to work through those activities or have an equivalent knowledge prior to working through the notebooks in this workshop.

Pre-workshop Prep

We'll have a lot to cover in the workshop and time is against us. Please try to come with a working notebook execution environment already setup and ready to go. The workshop repository README outlines three different options: build and run the docker container, use a GitHub Codespace, or run from a python venv managed via uv.

Due to the uncertain quality of conference internet, a local option (docker or using uv) is recommended, but Codespaces can be useful for those that cannot run either of those options.

Exploring Cloud-Native Geospatial Formats: Hands-on with Raster Data Workshop .ical 11-17, 13:30–16:30 (Pacific/Auckland), WF603

Prerequisites

Pre-workshop Prep

Exploring Cloud-Native Geospatial Formats: Hands-on with Raster Data Workshop
.ical

11-17, 13:30–16:30 (Pacific/Auckland), WF603