BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//pretalx//talks.osgeo.org//foss4g-europe-2026//talk//BVBPNG
BEGIN:VTIMEZONE
TZID:EET
BEGIN:STANDARD
DTSTART:20001029T050000
RRULE:FREQ=YEARLY;BYDAY=-1SU;BYMONTH=10
TZNAME:EET
TZOFFSETFROM:+0300
TZOFFSETTO:+0200
END:STANDARD
BEGIN:DAYLIGHT
DTSTART:20000326T040000
RRULE:FREQ=YEARLY;BYDAY=-1SU;BYMONTH=3
TZNAME:EEST
TZOFFSETFROM:+0200
TZOFFSETTO:+0300
END:DAYLIGHT
END:VTIMEZONE
BEGIN:VEVENT
UID:pretalx-foss4g-europe-2026-BVBPNG@talks.osgeo.org
DTSTART;TZID=EET:20260629T154500
DTEND;TZID=EET:20260629T155000
DESCRIPTION:Earth system science is increasingly driven by an unprecedented
  influx of heterogeneous Earth observation and model data\, but these data
  typically arrive as disparate products\, tiles\, and collections rather t
 han as uniform analysis-ready cubes. In response\, a growing set of data c
 ube frameworks aims to integrate heterogeneous datasets into common\, inte
 roperable spatio-temporal structures. Earth System Data Cubes (ESDCs) are 
 one such framework (Mahecha et al.\, 2020)\, and can be understood as labe
 lled\, multi-dimensional arrays of Earth system data that organize variabl
 es consistently across space and time (or any other dimension)\, enabling 
 uniform operations across common grids. Concretely\, ESDCs comprise (1) la
 belled dimensions defining the data cube axes\, (2) one or more grids with
  coordinate values distributed along these dimensions\, (3) univariate val
 ues associated with each grid cell\, and (4) a suite of attributes that ch
 aracterise the data variables\, the dimensions\, and the cube entity as a 
 whole. In practice\, however\, building such data cubes still requires sig
 nificant engineering to discover datasets\, harmonize metadata\, and creat
 e consistent arrays that Artificial Intelligence (AI) models can consume (
 Montero et al.\, 2024a).\n\nIn recent years\, the SpatioTemporal Asset Cat
 alog (STAC) specification has become a widely adopted way to describe and 
 access cloud-hosted geospatial assets\, enabling programmatic discovery an
 d standardized links to imagery and other derived products. Building on th
 is ecosystem\, we developed cubo (Montero et al.\, 2024b)\, an open-source
  Python tool for creating AI-focused ESDCs from STAC catalogues\, producin
 g data cubes (as xarray objects) on regular spatial grids with consistent 
 array shapes (e.g. matching pixel counts along x and y or longitude and la
 titude). Yet a large portion of routinely used Earth observation data is a
 ccessed through Google Earth Engine (GEE)\, a cloud-based platform that ho
 sts a large\, curated catalogue of geospatial datasets and provides scalab
 le\, planetary-scale analysis via both JavaScript and Python APIs (Gorelic
 k et al.\, 2017). The catalogue spans long optical and radar satellite arc
 hives (e.g. Landsat and Sentinel-1 and Sentinel-2)\, widely used global pr
 oducts (e.g. MODIS\, ERA5 reanalysis\, SRTM)\, and thematic layers and der
 ived datasets such as land cover and vegetation indices.\n\nAs a result\, 
 users face a fragmentation problem: cubo can readily create ESDCs from STA
 C catalogues\, but datasets that are primarily accessed via GEE remain out
  of reach for the same data cube specification and output conventions.\n\n
 Here we present a Google Earth Engine (GEE) backend for cubo that generate
 s on-demand AI-focused Earth System Data Cubes (ESDCs) directly from GEE\,
  using the same data cube specification concept developed initially for ST
 AC catalogues and returning consistent xarray outputs (Hoyer and Hamman\, 
 2017).\n\nThe optional GEE backend mirrors the STAC workflow in cubo: user
 s specify cube centre coordinates (longitude and latitude)\, a temporal wi
 ndow\, bands\, cube edge size (pixels)\, and a target spatial resolution\,
  and cubo derives the corresponding bounding box in the local Universal Tr
 ansverse Mercator (UTM) Coordinate Reference System (CRS). This keeps the 
 data cube definition explicit and comparable across studies\, and it makes
  the data preparation step a parameterized part of the workflow. The key d
 ifference is the data access layer: instead of retrieving assets via STAC 
 (using stackstac: https://github.com/gjoseph92/stackstac)\, cubo queries G
 EE collections through xee (https://github.com/google/Xee)\, an xarray int
 erface to Earth Engine that returns the result directly as xarray objects.
  From the user perspective\, the same cube specification is reused\, with 
 the collection identifier now pointing to a GEE collection. The only addit
 ional argument in the main cubo function is selecting the GEE backend (via
  a boolean flag). This keeps data cube construction consistent across back
 ends while leveraging GEE as a scalable data access and processing environ
 ment.\n\nBy aligning GEE-based cube creation with an existing STAC-based c
 ube workflow\, the GEE backend lowers the practical barrier to switching b
 etween catalogues and platforms without rewriting entire pipelines. It als
 o opens up access to datasets that are primarily available through GEE (e.
 g. CloudScore+\, Dynamic World\, or the novel AlphaEarth Embeddings) while
  still adhering to the same cube specification and output conventions. Ret
 rieving data cubes from GEE and from STAC catalogues using the same cube s
 pecification also enables users to merge data cubes across backends with m
 inimal effort\, since they share consistent dimensions and coordinates. Th
 is is particularly relevant for open geospatial ecosystems\, where interop
 erability and transparent data preparation are prerequisites for comparabl
 e results across studies.\n\nWe release the Earth Engine support as an opt
 ional backend in cubo (installable via the extra cubo[ee])\, which is free
  and open source\, hosted on GitHub (https://github.com/ESDS-Leipzig/cubo)
 \, and distributed through common Python channels (PyPI and conda-forge). 
 We expect users to benefit from this update since they can now retrieve da
 ta from both STAC catalogues and GEE in the same way for their scientific 
 workflows\, using consistent cube specifications across backends.\n\nLooki
 ng forward\, we plan to extend cubo so that multiple datasets can be retri
 eved and organised directly into a single data cube without rerunning the 
 full workflow for each collection\, regardless of the backend they come fr
 om. We also plan to broaden the set of supported backends to additional wi
 dely used packages in the open geospatial ecosystem\, such as odc-stac.
DTSTAMP:20260605T010337Z
LOCATION:A01
SUMMARY:A unified framework for building AI-focused Earth System Data Cubes
  across STAC and Google Earth Engine - David Montero Loaiza
URL:https://talks.osgeo.org/foss4g-europe-2026/talk/BVBPNG/
END:VEVENT
END:VCALENDAR
