David Montero Loaiza FOSS4G Europe 2026

David Montero Loaiza
.ical

David Montero Loaiza is a PhD candidate in Physics and Earth System Science at Leipzig University, Germany, and a Google Developer Expert for Google Earth Engine (GEE). He is the main developer of Awesome Spectral Indices and its associated Python and GEE Code Editor APIs, spyndex and spectral. He has also developed several other open-source projects, including eemont, cubo, and sen2nbar.

Sessions

06-29

15:45

5min

A unified framework for building AI-focused Earth System Data Cubes across STAC and Google Earth Engine

David Montero Loaiza

Earth system science is increasingly driven by an unprecedented influx of heterogeneous Earth observation and model data, but these data typically arrive as disparate products, tiles, and collections rather than as uniform analysis-ready cubes. In response, a growing set of data cube frameworks aims to integrate heterogeneous datasets into common, interoperable spatio-temporal structures. Earth System Data Cubes (ESDCs) are one such framework (Mahecha et al., 2020), and can be understood as labelled, multi-dimensional arrays of Earth system data that organize variables consistently across space and time (or any other dimension), enabling uniform operations across common grids. Concretely, ESDCs comprise (1) labelled dimensions defining the data cube axes, (2) one or more grids with coordinate values distributed along these dimensions, (3) univariate values associated with each grid cell, and (4) a suite of attributes that characterise the data variables, the dimensions, and the cube entity as a whole. In practice, however, building such data cubes still requires significant engineering to discover datasets, harmonize metadata, and create consistent arrays that Artificial Intelligence (AI) models can consume (Montero et al., 2024a).

In recent years, the SpatioTemporal Asset Catalog (STAC) specification has become a widely adopted way to describe and access cloud-hosted geospatial assets, enabling programmatic discovery and standardized links to imagery and other derived products. Building on this ecosystem, we developed cubo (Montero et al., 2024b), an open-source Python tool for creating AI-focused ESDCs from STAC catalogues, producing data cubes (as xarray objects) on regular spatial grids with consistent array shapes (e.g. matching pixel counts along x and y or longitude and latitude). Yet a large portion of routinely used Earth observation data is accessed through Google Earth Engine (GEE), a cloud-based platform that hosts a large, curated catalogue of geospatial datasets and provides scalable, planetary-scale analysis via both JavaScript and Python APIs (Gorelick et al., 2017). The catalogue spans long optical and radar satellite archives (e.g. Landsat and Sentinel-1 and Sentinel-2), widely used global products (e.g. MODIS, ERA5 reanalysis, SRTM), and thematic layers and derived datasets such as land cover and vegetation indices.

As a result, users face a fragmentation problem: cubo can readily create ESDCs from STAC catalogues, but datasets that are primarily accessed via GEE remain out of reach for the same data cube specification and output conventions.

Here we present a Google Earth Engine (GEE) backend for cubo that generates on-demand AI-focused Earth System Data Cubes (ESDCs) directly from GEE, using the same data cube specification concept developed initially for STAC catalogues and returning consistent xarray outputs (Hoyer and Hamman, 2017).

The optional GEE backend mirrors the STAC workflow in cubo: users specify cube centre coordinates (longitude and latitude), a temporal window, bands, cube edge size (pixels), and a target spatial resolution, and cubo derives the corresponding bounding box in the local Universal Transverse Mercator (UTM) Coordinate Reference System (CRS). This keeps the data cube definition explicit and comparable across studies, and it makes the data preparation step a parameterized part of the workflow. The key difference is the data access layer: instead of retrieving assets via STAC (using stackstac: https://github.com/gjoseph92/stackstac), cubo queries GEE collections through xee (https://github.com/google/Xee), an xarray interface to Earth Engine that returns the result directly as xarray objects. From the user perspective, the same cube specification is reused, with the collection identifier now pointing to a GEE collection. The only additional argument in the main cubo function is selecting the GEE backend (via a boolean flag). This keeps data cube construction consistent across backends while leveraging GEE as a scalable data access and processing environment.

By aligning GEE-based cube creation with an existing STAC-based cube workflow, the GEE backend lowers the practical barrier to switching between catalogues and platforms without rewriting entire pipelines. It also opens up access to datasets that are primarily available through GEE (e.g. CloudScore+, Dynamic World, or the novel AlphaEarth Embeddings) while still adhering to the same cube specification and output conventions. Retrieving data cubes from GEE and from STAC catalogues using the same cube specification also enables users to merge data cubes across backends with minimal effort, since they share consistent dimensions and coordinates. This is particularly relevant for open geospatial ecosystems, where interoperability and transparent data preparation are prerequisites for comparable results across studies.

We release the Earth Engine support as an optional backend in cubo (installable via the extra cubo[ee]), which is free and open source, hosted on GitHub (https://github.com/ESDS-Leipzig/cubo), and distributed through common Python channels (PyPI and conda-forge). We expect users to benefit from this update since they can now retrieve data from both STAC catalogues and GEE in the same way for their scientific workflows, using consistent cube specifications across backends.

Looking forward, we plan to extend cubo so that multiple datasets can be retrieved and organised directly into a single data cube without rerunning the full workflow for each collection, regardless of the backend they come from. We also plan to broaden the set of supported backends to additional widely used packages in the open geospatial ecosystem, such as odc-stac.

From NDVI to an Open Ecosystem: Five Years of Awesome Spectral Indices

David Montero Loaiza

Five years ago, Awesome Spectral Indices (ASI) was launched to address a persistent gap in Earth observation workflows: while hundreds of spectral indices existed in the literature, their definitions were fragmented, inconsistently documented, and rarely designed for direct programmatic use. What began as a curated effort to standardize and consolidate these definitions has since evolved into shared open geospatial infrastructure.

The first public release in 2021 included 66 indices structured under a common schema with explicit naming, formulas, application domains, and bibliographic references. A key design decision was the introduction of a cross-sensor band naming standard aligned with widely used satellite platforms such as Landsat, Sentinel, and MODIS. By enabling expressions like “(N - R) / (N + R)” to be both human-readable and machine-executable, ASI moved from being a static catalogue to a lightweight and interoperable specification.

Over the past five years, the project has grown to more than 260 indices (v0.9.0) and expanded beyond a single repository into a multi-language ecosystem. Open-source APIs operationalize the specification in Python (spyndex), the Google Earth Engine Code Editor (spectral), and Julia (SpectralIndices.jl), alongside community-driven implementations such as the R package rsi. With over 1k GitHub stars, more than 200k downloads across PyPI and conda-forge, and alignment with the electro-optical STAC extension, ASI now functions as reusable infrastructure embedded in reproducible Earth observation workflows.

This talk reflects on five years of technical and community development: the evolution from list to specification, a design that supports scientific completeness and implementation simplicity, and the role of metadata and versioning in ensuring long-term sustainability. It concludes with the next phase of development, including extensions to the band standard, richer metadata, expanded categorization, and API refinements aimed at strengthening interoperability and ensuring that spectral indices remain stable and accessible within the open geospatial ecosystem.

Remote Sensing

A13

David Montero Loaiza .ical

Sessions

David Montero Loaiza
.ical