FOSS4G 2022 general tracks

Claudio Navacchi

Claudio Navacchi was born in Austria in 1994 and studied Geodesy and Geoinformation at the Vienna University of Technology (TU Wien). During his M.Sc. studies, he began to work as a project assistant in the Microwave Remote Sensing Research Group and was soon getting involved in national and international projects dealing with earth observation data processing, analysis and dissemination, e.g. the Austrian Data Cube (ACube) or openEO.

After he graduated in 2019, he started to do a Ph.D. in 2020 focusing on improving Sentinel-1 data pre-processing routines and curating backscatter time series to minimise the influence of Sentinel-1's distinct observation pattern. He is currently part of the Global Flood Monitoring (GFM) and satellite-based grassland monitoring (SatGrass) project allowing to pursue his research interests in satellite data processing, radiation and land-surface interactions, vegetation and water-cycle dynamics, and scientific programming.


Sessions

08-25
10:15
5min
yeoda - providing low-level and easy-to-use access to manifold earth observation datasets
Claudio Navacchi

In recent years, several Python packages (e.g. xarray, rasterio) have evolved around more basic software libraries such as netCDF4 or GDAL for accessing geospatial data. These packages allow to work with all kind of data formats (e.g. GeoTIFF, NetCDF, ZARR) providing the data in array format (NumPy, xarray) and constitute a fundamental part of any scientific analysis or operational task. However, they do not offer full flexibility when working with Earth Observation (EO) datasets. The multidimensional complexity of EO data (i.e. space, time, bands) is often resolved by distributing dimensions across many files and thus not always easy to access. An important step forward to streamline EO data access has been the Open Data Cube (ODC) toolbox, which utilizes predefined dataset configurations and file-based indices stored in a database. With this setup, ODC enables an easy and uniform access to multidimensional geospatial datasets. Still, users are often confronted with a great variety of data formats, and files being distributed over different systems. This can pose a hurdle when working with ODC, especially if one wants to process a new stack of geospatial data, where the extra overhead of a database can stall swift progress.

In order to close this gap, the yeoda (''your earth observation data access'') Python software package aims to resolve this shortcoming by offering a similar interface as ODC, but allowing to interact with geospatial data on a lower level. It relies on two other Python software packages developed by TU Wien: geospade (definition of geospatial properties of a dataset, e.g. geometries), and veranda (read/write access to a variety of raster and vector data formats, e.g. GeoTIFF). This modular setup ensures a clear separation of concerns, specifically between geospatial operations and I/O tasks, yielding a homogenized interface independent from the actual data format. For example, geospatial operations based on tiled EO raster datasets can be easily performed across tile or file boundaries. Data access is then realised in veranda, which combines geometric properties with I/O objects listed in a table. On top of geospade and veranda, yeoda acts as a communication layer between files stored on the file system and data objects by adding additional dimensions to the data table, such as common metadata or file name entries. Thus, one can filter multiple files by their attributes (e.g. time, bands, variable names, satellite platform) before accessing the data.

Hence, yeoda guarantees the necessary freedom to apply arbitrary algorithms on manifold data formats, while simultaneously supporting scalability by means of parallelised I/O operations. Despite ODC's tremendous value for accessing EO datasets through large scale operational services, yeoda introduces a new level of data interaction making it an indispensable tool for the EO user community. When taking a look on recent advancements in interoperable cloud-based processing via the openEO API, yeoda could be utilized as a slim back-end library to lower the hurdle of sharing new EO datasets and to foster scientific exchange.

State of software
Modulo 0