2026-09-03 –, Conference Management Room4
The Earth System Grid Federation (ESGF) is a global partnership supporting the distribution, archive, and discovery of climate data. Its new architecture introduces STAC catalogues, Kerchunk‑enabled access, and an event‑driven search system synchronising two discovery nodes, improving consistency, reliability, and interoperability across climate and Earth observation communities.
The Earth System Grid Federation (ESGF) is the international partnership responsible for the distribution, archive, and discovery of both the Coupled Model Intercomparison Project (CMIP) and the Coordinated Regional Climate Downscaling Experiment (CORDEX). In operation since 2009, it was the first decentralised climate data repository of its kind, storing and serving many petabytes of data across tens of global and region data centre partners.
Over the last five years, a full rearchitect of the system has been conducted, introducing a cloud-ready deployment architecture and a new system for distributed search, fundamental to ESGF’s federated model for data access. This has involved innovations, translating successful experience with the STAC (Spatio-Temporal Asset Catalogue) specification from the EO world and developing a profile for its use with global climate projections data. Providing a STAC interface to ESGF archives has allowed us to explore alternate access methods for cloud-accessible analysis-ready data formats through the use of tools such as Kerchunk, a lightweight non-conversion approach for referencing existing data, which works with open-source python packages like fsspec and Xarray. Use of STAC also provides the potential for greater integration between EO and climate modelling domains essential for the validation of model outputs.
ESGF has traditionally used a distributed model for search services which though powerful has led to challenges around consistency of search content. Over the last twelve months, in preparation for CMIP7, a further fundamental innovation has been made in the architecture to address these issues. The new system adopts a centralised model, with two search nodes, one in the US and one in Europe each hosted on public cloud. These two nodes are synchronised together using a new event-driven architecture. This approach, driven by a shared messaging framework between the nodes, ensures eventual-consistency across the nodes, to reduce or eliminate errors caused by individual node down time and simplify processes such as the replication and retraction of data from the archives distributed at sites across the federation.
The move to a message based, event driven architecture has been integrated with STAC records and services. In ESGF-NG publication and updates of data are shared between nodes through events in a Kafka stream in the form of STAC API calls, ensuring a consistent, publicly documented archive distributed across many nodes. The ESGF team have contributed several changes to the STAC project to facilitate this change. Looking forward, we see potential in this new event driven architecture for search systems as a means to integrate across federations - in the European context this could include the ESA Climate Change Initiative open data portal, work with the Copernicus Climate Data Store and DestinE.
ESGF: https://esgf.github.io/
STAC: https://stacspec.org/
Kerchunk: https://fsspec.github.io/kerchunk/
VirtualiZarr: https://virtualizarr.readthedocs.io/
STAC - Spatio-Temporal Asset Catalogs a standardized way to expose collections of spatial temporal data.
Kerchunk - a provides a unified way to represent a variety of chunked, compressed data formats allowing efficient access to the data from traditional file systems or cloud object storage.
VirtualiZarr - virtual Zarr stores for cloud-friendly access to archival data, using familiar xarray syntax.
Kafka - distributed event streaming platform.
I'm a developer at The Centre for Environmental Data Analysis (CEDA) in the UK. We provide a data analysis and archive platform for the Environmental research community. My expertise is in search managing CEDA's Elasticsearch cluster and STAC catalogue as well as leading CEDA's development efforts for the ESGF Next Gen project.