FOSS4G NA 2024

Vector Data Cubes in Python with Xarray and Zarr
09-11, 11:00–11:30 (America/Chicago), Grand C

Vector data cubes are powerful multidimensional data structures for the analysis of data whose geospatial coordinates are described by vector geometries. This talk explains recent developments that enable vector data cubes in Xarray and Zarr.


Data cubes, traditionally associated with raster data, have transformed how the EO community thinks about data analysis, supporting use cases such as statistical analysis, assessing changes and trends, and training predictive models. Raster data cubes are fundamentally multidimensional. They typically have two spatial dimensions as well as a time dimension, making them a an ideal fit for the Xarray Python package—most data cube analysis in Python today uses Xarray as the container for the data.

Vector data cubes extend the concept of data cubes to vector data; instead of describing spatial locations as pixels within a raster grid, each item in the spatial dimension can be a vector geometry, such as a point or a polygon. This data structure is ideal for analyzing timeseries associated with multiple variables across a harmonized set of geometries, such as country- or county-level statistics.

Vector data cubes were originally developed in R, via the “stars” package, and, until recently, Python users have not been able to use them. However, some recent developments have now unlocked vector data cubes in Python. These developments include:
- The continued evolution of geopandas and shapely for greater interoperability
- The support of pluggable, user-defined indexes in Xarray, making it possible to create a “geometry index”
- The Xvec package, which ties these concepts together
- The CF-Xarray package, which implements serialization conventions for geometries, allowing vector data cubes to be stored in either NetCDF or Zarr

This talk will explain how these developments come together to provide a powerful vector data cube experience for Python users. We will demonstrate how to build, query, and save vector data cubes and illustrate their potential to greatly simplify common workflows around climate data analysis. Finally, we will conclude with a survey of some of the remaining challenges around integrating spatiotemporal raster and vector data in Python.

See also: XVec Logo