07-17, 14:00–14:30 (Europe/Sarajevo), EL11
Destination Earth (DestinE) is a flagship initiative led by the European Commission, implemented by the European Organisation for the Exploitation of Meteorological Satellites (EUMETSAT), the European Space Agency (ESA) and the European Centre for Medium-Range Weather Forecasts (ECMWF). It aims to create highly detailed Digital Twins (DTs) of the Earth, enabling precise simulations for a variety of uses. Currently, the initiative focuses on two primary Digital Twins: the Weather Extremes Digital Twin (ExtremeDT) and the Climate Change Adaptation Digital Twin (ClimateDT). Over the coming years, the scope of Digital Twins is set to expand, necessitating improved access to data and streamlined methods for working with it. This is where the Destination Earth Data Lake (DEDL) plays a pivotal role, offering comprehensive data discovery, access, and processing services tailored to the needs of DestinE users.
The DEDL operates on two key levels: ‘Data Discovery and Access’ and ‘Edge Services’. DEDL Discovery and Data Access services is provided by Harmonized Data Access (HDA) tool which provides a single, federated entry point to the services and data, including resources from existing datasets and complementary sources such as in-situ and socio-economic data. Notably, it also provides access to the unique datasets generated by DestinE’s Digital Twins. By combining these sources, users can seamlessly explore, integrate, and analyze both existing services and the innovative data produced by the Digital Twins. What is more, all this data is provided as a full archive immediately available to the user. The services rely on use of the SpatioTemporal Asset Catalogs (STAC) standard which means:
• The search in the dataset is done according to the STAC protocol;
• The Federated Catalog search proxy component converts STAC queries into queries adapted to the underlying catalog and returns the results to the user in STAC format;
• The services are presented in service catalog.
Edge Services offered by DEDL provides:
• Cloud Computing
• STACK Application Development Environment
• Hook Services
The cloud computing service is powered by the ISLET infrastructure, a distributed Infrastructure as a Service (IaaS) built on OpenStack, using the Horizon interface. It allows users to manage virtual machines, s3 storage, and run advanced computations via a graphical user interface (GUI) or command-line interface (CLI). For more complex tasks, Kubernetes integration is available. A standout feature of ISLET is its proximity to data sources, operating near High-Performance Computing (HPC) facilities. This is achieved through data bridges, enabling efficient processing of large datasets, including those from Digital Twins, in conjunction with HPC systems.
The STACK environment supports application development using JupyterHub and DASK, with Python, and R languages. Users can create DASK clusters on selected infrastructure or cloud sites to process data directly where it resides, removing the need for extensive local setup and optimization.
Hook Services is a set of pre-defined workflows which could be used by users as a ready-to-use processors, e. g. : Sentinel-2: MAJA Atmospheric Correction; , Sentinel-2: SNAP-Biophysical; Sentinel-1: Terrain-corrected backscatter. It also enables workflow functions to generate on-demand higher-level products, such as temporal composites.
The DestinE Data Lake is a transformative initiative that revolutionizes how Earth Observation data is managed and utilized. By integrating innovative infrastructure (ISLET), data services (HDA), reliable processors (Hook Services), and user-friendly development tools (STACK), DEDL enables unprecedented levels of data harmonization, federation, and processing. Moreover, the DEDL plays a crucial role in empowering DestinE users by providing them with seamless access to vast datasets and advanced computational tools. It simplifies the process of data exploration, integration, and analysis, enabling researchers, policymakers, and developers to focus on innovation and decision-making rather than technical barriers. By offering a comprehensive suite of services designed to work close to the data, DEDL ensures that users can efficiently utilize the wealth of information generated by the Digital Twins and maximize the impact of their work. This cutting-edge system enhances climate research capabilities and supports sustainable development efforts on a scale previously unattainable.
- Data: All data provided by the Destination Earth Data Lake (DEDL) is open to the public. Moreover, data is not stored centrally but is accessed through a data federation model. This means that DEDL supports access to data directly from the source provider.
- Cloud Infrastructure: The cloud infrastructure is based on open-source software - OpenStack. Furthermore, the hardware is an integral part of the entire DestinE project and is distributed across five different locations.
- Services: The services provided by DEDL are based on Python and R tools. Additionally, DEDL offers access to Jupyter Lab and DASK, which are distributed across different cloud locations. Users can deploy these tools on any of the available locations.
To sum up, all of this makes DEDL open source based cloud, data and services inititative.
Assign a number between 1 and 3 indicating the level of technical complexity of your contribution. –2 - background knowledge helpful
Give indication of resources (video, web pages, papers, etc.) to read in advance, that will help get up to speed on advanced topics. –https://data.destination-earth.eu/ - DEDL Portal
https://platform.destine.eu/ - Destination Earth Core Service Platfrom
https://destination-earth.eu/ - Destination Earth webstite
https://github.com/destination-earth/DestinE-DataLake-Lab - Examples and tutorials how to use DEDL
Data access, collection & sharing, Data processing and analysis, Data visualization, FOSS4G and environmental observations, Education
I make my conference contribution available under the CC BY 4.0 license. The conference contribution comprises the abstract, the text contribution for the conference proceedings, the presentation materials as well as the video recording and live transmission of the presentation – yesData Scientist in the area of Earth Observation.