07-17, 16:40–16:45 (Europe/Sarajevo), SA02
The Destination Earth Data Lake Lab (DestinE-DataLake-Lab) is a comprehensive GitHub repository designed to facilitate users' interaction with the Destination Earth Data Lake (DEDL) services. Developed by EUMETSAT and partners, this repository offers a collection of Jupyter Notebook examples and Python tools that demonstrate how to effectively utilize various DEDL services, including Harmonized Data Access (HDA), STACK, and HOOK.
Harmonized Data Access (HDA)
The HDA service provides users with streamlined access to a diverse range of datasets within the DEDL ecosystem. Within the repository, the HDA directory contains Jupyter Notebook examples that guide users through the process of discovering available services, listing and searching for STAC collections, and retrieving specific data items. These examples are instrumental in helping users understand how to interact with the HDA API, manage authentication, and perform data queries efficiently.
STACK Service
The STACK service is designed to facilitate near-data processing by leveraging DASK, a flexible parallel computing library in Python. In the STACK directory of the repository, users will find Jupyter Notebook examples that illustrate how to set up and utilize DASK for processing large datasets distributed across different cloud locations. These examples demonstrate the deployment of DASK clusters, execution of parallel computations, and optimization of data processing workflows, enabling users to perform complex analyses efficiently.
HOOK Service
The HOOK service offers Function-as-a-Service (FaaS) capabilities, allowing users to define and execute workflows within the DEDL environment. The HOOK directory in the repository provides Jupyter Notebook examples that guide users through the process of creating, deploying, and managing workflows using the HOOK service. These tutorials cover various aspects, including defining functions, setting up triggers, and monitoring workflow execution, thereby enabling users to automate data processing tasks effectively.
Getting Started
To begin utilizing the resources provided in the DestinE-DataLake-Lab repository, users are encouraged to clone the repository into their local environment or access it through the DEDL-provided JupyterHub - STACK Service. The repository includes a requirements.txt file that lists the necessary Python dependencies. Users should create a virtual environment, install the required packages, and select the appropriate kernel when running the provided notebooks. Detailed instructions for setting up the environment and installing dependencies are available in the repository's README file.
Additional Resources
For further information and comprehensive documentation on DEDL services, users can refer to the DestinE Data Lake documentation. This resource provides in-depth guides, API references, and additional tutorials to assist users in maximizing their utilization of DEDL services. Moreover, the DestinE Data Portfolio and Data Lake Edge services offer valuable insights into the available datasets and services within the DEDL ecosystem.
Summary
In summary, the DestinE-DataLake-Lab repository serves as a valuable resource for users aiming to effectively engage with the Destination Earth Data Lake services. By providing practical examples and comprehensive guides, it empowers users to harness the full potential of DEDL's offerings, facilitating efficient data access, processing, and workflow management within the Destination Earth initiative.
DestineLab is built on the Destination Earth Data Lake, where data is publicly accessible, and services are based on open-source solutions such as Python and OpenStack.
Assign a number between 1 and 3 indicating the level of technical complexity of your contribution. –2 - background knowledge helpful
Give indication of resources (video, web pages, papers, etc.) to read in advance, that will help get up to speed on advanced topics. –https://github.com/destination-earth/DestinE-DataLake-Lab - Examples and tutorials how to use DEDL
https://data.destination-earth.eu/ - DEDL Portal
https://platform.destine.eu/ - Destination Earth Core Service Platfrom
https://destination-earth.eu/ - Destination Earth webstite
Data access, collection & sharing, Data processing and analysis, Data visualization, FOSS4G and environmental observations, Education
I make my conference contribution available under the CC BY 4.0 license. The conference contribution comprises the abstract, the text contribution for the conference proceedings, the presentation materials as well as the video recording and live transmission of the presentation – yesData Scientist in the area of Earth Observation.