Pieter Kempeneers
Pieter Kempeneers is with the Joint Research Centre of the European Commission, Ispra (Italy). He is an engineer in electronics with a PhD in science. His interest is in remote sensing, image processing, and time series analysis. He also develops and uses open source software. Pieter is leading a team on data analytics services that deals with artificial intelligence, data management, data processing, and data visualization.
Sessions
The Joint Research Centre (JRC) of the European Commission provides independent, evidence-based science and knowledge that supports European Union policies. To facilitate this, the JRC has developed the Big Data Analytics Platform (BDAP), a data platform that is entirely based on free and open-source software (FOSS). It allows data scientists from the JRC to easily access, analyze, view, and reuse scientific data at a petabyte scale. The majority of the hosted data are geospatial data from various domains including Earth observation imagery from the Copernicus Sentinel missions. Data are automatically downloaded from the Copernicus Data Space Ecosystem, processed and stored in an open source distributed filesystem (eos). These individual steps are implemented as microservices using docker compose. To facilitate data access, an application programming interface (API) was implemented following the Spatio Temporal Asset Catalog (STAC) specification. It exposes collections of spatial temporal data in a standardized way, which has given rise to an ecosystem of FOSS tools, including pystac and odc-stac. Based on simple queries through REST APIs, collections and their individual data items can be queried based on geographic location and acquisition time. In addition, the JRC has developed a suite of libraries for geospatial data processing (pyjeo) and create data science dashboards (Vois) that were released as FOSS. In this talk, these libraries will be introduced, while presenting real case studies that illustrate how these libraries were instrumental in providing policy support using reproducible workflows. In particular, a case study on monitoring water in the European continent will be presented. It uses Sentinel-2 satellite imagery to create time series of water masks based on machine learning techniques. A monitoring system is set up by comparing the extent of water for a defined set of water reservoirs over time.