Patryk Grzybowski

Data Scientist at CloudFerro S.A.


Sessions

07-04
14:00
30min
Destination Earth Data Lake (DEDL) – discovery, access and process data
Patryk Grzybowski

Destination Earth initiative (DestinE), driven by the European Organisation for the European Organisation for the Exploitation of Meteorological Satellites (EUMETSAT), the European Space Agency (ESA) and the European Centre for Medium-Range Weather Forecasts (ECMWF) aims to create a highly accurate replica - Digital Twin - of the Earth. The first two existing Digital Twins describe weather-induced and geophysical extremes, as well as climate change adaptation. Ine the next years, the number of Digital Twins is going to be grown. Thus, to develop new models, there is a high need to facilitate access to data and ways of working with data.. This is made possible by one of three key DestinE’s elements - Destination Earth Data Lake (DEDL) which provides open discovery, access, and big data processing services.

DEDL Discovery and Data Access services is provided by Harmonized Data Access (HDA) tool which provides a single, federated entry point to the services and data. The DestinE Data Lake federates with existing data holdings as well as with complementary data from diverse sources like in-situ, socio-economic, or space data. And very importantly, it provides access to data generated by DestinE Digital Twins All this allows for exploration, combination and assimilation of data shared by existing services with innovative Digital Twins data. What is more, all this data is provided as a full archive immediately available to the user. The services rely on use of the SpatioTemporal Asset Catalogs (STAC) standard which means:
• The search in the dataset is done according to the STAC protocol;
• The Federated Catalog Search Proxy component converts STAC queries into queries adapted to the underlying catalog and returns the results to the user in STAC format;
• The services are presented in service catalog.

Thus, exploring through the datasets and work with data provided by DEDL is user-friendly as well as adapted to the newest trends and requirements.

Big Data Processing Services offered by DEDL provides:
• Cloud computing;
• STACK application development environment;
• Hook service.

Cloud computing service uses the ISLET infrastructure as a service (IaaS) deployed on OpenStack - open source cloud computing infrastructure software - with the Horizon interface. It allows users to create and manage virtual machines as well as local users’ storage through graphical user interface (GUI) and command line interface (CLI). It also makes possible to use Kubernetes for more demanding jobs. What makes ISLET exceptional is providing services in proximity to data holdings as a distributed infrastructure close to High Performance Computing (HPC). It is possible due to data bridges – edge clouds enables operating with large volume data (also generated by DestinE Digital Twins) in conjunction with computing capabilities of HPC.

STACK application development environment utilizes JupyterHub/DASK with Python, R and Julia. It allows users create DASK cluster on selected infrastructure/cloud site and do processing close to the data. Thank to that, users do not need to adapt their own local environment to work with DEDL services.

Hook Services is a set of pre-defined workflows which could be used by users as a ready-to-use processors, e. g. : Sentinel-2: MAJA Atmospheric Correction; , Sentinel-2: SNAP-Biophysical; Sentinel-1: Terrain-corrected backscatter. It also enables workflow functions to generate on-demand higher-level products, such as temporal composites.

The DestinE Data Lake represents a groundbreaking initiative that transforms the management of Earth Observation data, enhancing capabilities in climate research, and bolstering initiatives for sustainable development. DEDL provides unique infrastructure (ISLET), open source services to discover and obtain data (HDA), reliable and trusted processors (Hook) and service to user-friendly dealing with data (Stack). The fundamental principles behind the DEDL as well as novel cloud solution will enable data harmonization, federation and processing on a scale beyond current capabilities.

Open Data
Destination Earth (Van46 ring)