Marcin Niemyjski
As a trained surveyor, I have honed my skills in data analysis and interpretation, which I have leveraged in my transition towards becoming a Junior Data Scientist. Over the last two years, I have been exploring the realm of GIS, remote sensing, LiDAR, and data processing, which have become my primary areas of interest. My vision of the spatial data industry is that of a puzzle with open datasets, Python, SQL, and Machine Learning techniques being the pieces that need to fit together. In my opinion, the future of GIS lies in Big Data solutions and cloud computing, and I am fortunate to be developing my skills in this direction while working at CloudFerro.
Sessions
This work presents the tool used to create the STAC Copernicus Data Space Ecosystem catalog—the largest and most comprehensive STAC catalog in terms of metadata globally. It details the process from developing a metadata model for Sentinel data, through efficient indexing based on the original metadata files accompanying the products, to result validation and backend system ingestion. A particular highlight is that this entire process is executed using a single tool, eometadatatool, initially developed by DLR, further enhanced, and released as open-source software by the CloudFerro team. Eometadatatool facilitates metadata extraction from the original files accompanying Copernicus program products and others (e.g., Landsat, Copernicus Contributing Missions) based on a CSV file containing the metadata name, the name of the file in which it occurs, and the path to the key within the file. By default, the tool supports product access via S3 resources, configurable through environment variables. The CDSE repository operates as an S3 resource, offering users free access. The tool is aimed to be released as open source in Q1 of 2025. The work will explore potential use cases and demonstrate the basic capabilities of the tool.