FOSS4G 2022 academic track

Semantic querying in earth observation data cubes
08-25, 14:45–15:15 (Europe/Rome), Room Hall 3A

Earth observation (EO) imagery has become an essential source of information to better monitor and understand the impact of major social and environmental issues. In recent years we have seen significant improvements in availability and accessibility of these data. Programs like Landsat and Copernicus release new images every day, freely and openly available to everyone. Technological improvements such as data cubes (e.g. OpenDataCube), scalable cloud-based analysis platforms (e.g. Google Earth Engine) and standardized data access APIs (e.g. OpenEO) are easing the retrieval of the data and enabling higher processing speeds.

All these developments have lowered the barriers for utilizing the value of EO imagery, yet translating EO imagery directly into information using automated and repeatable methods remains a main challenge. Imagery lacks inherent semantic meaning, thus requires interpretation. For example, consider someone who uses EO imagery to monitor vegetation loss. A multi-spectral satellite image of a location may consist of an array of digital numbers representing the intensity of reflected radiation at different wavelengths. The user, however, is not interested in digital numbers, they are interested in a semantic categorical value stating if vegetation was observed. Inferring this semantic variable from the reflectance values is an inherently ill-posed problem, since it requires bridging a gap between the two-dimensional image domain and the four-dimensional spatio-temporal real-world domain. Advanced technical expertise in the field of EO analytics is needed for this task, making it a remaining barrier on the way to a broad utilization of EO imagery across a wide range of application domains.

We propose a semantic querying framework for extracting information from EO imagery as a tool to help bridge the gap between imagery and semantic concepts. The novelty of this framework is that it makes a clear separation between the image domain and the real-world domain.

There are three main components in the framework. The first component forms the real-world domain. This is where EO data users interact with the system. They can express their queries in the real-world domain, meaning that they directly reference semantic concepts that exist in the real world (e.g. forest, fire). For simplicity reasons, we currently work on a higher level of abstraction, and focus on concepts that correspond to land-cover classes (e.g. vegetation). For example, a user can query how often vegetation was observed at a certain location during a certain timespan. These queries do not contain any information on how the semantic concepts are represented by the underlying data.

The second component forms the image domain. This is where the EO imagery is stored in a data cube, a multi-dimensional array organizing the data in a way that simplifies storage, access and analysis. Besides the imagery itself, the data cube may be enriched with automatically generated layers that already offer a first degree of interpretation for each pixel (i.e. a semantically-enabled data cube [1]), as well as with additional data sources that can be utilized to better represent certain properties of real-world semantic concepts (e.g. digital elevation models).

The third component serves as the mapping between the real-word domain and the image domain. This is where EO data experts bring their expertise into the system, by formalizing relationships between the observed data values and the presence of a real-world semantic concept. In our current work these relationships are always binary, meaning that the concept is marked either as present or not present. However, the structure allows also for non-binary relationships, e.g. probabilities that a concept is present given the observed data values.

We implemented a proof-of-concept of our proposed framework as an open-source Python library (see https://github.com/ZGIS/semantique). The library contains functions and classes that allow users to formulate their queries and call a query processor to execute them with respect to a specific mapping. Queries are formulated by chaining together semantic concept references and analytical processes. The query processor will translate each referenced semantic concept into a multi-dimensional array covering the spatio-temporal extent of the query. It does so by retrieving the relevant data values from the data storage, and subsequently applying the rules that are specified in the mapping. If the relationships are binary, the resulting array will be boolean, with “true” values for those pixels that are identified as being an observation of the referenced concept, and “false” values for all other pixels. Analytical processes can then be applied to this array. Each process is a well-defined array operation performing a single task. For example, applying a function to each pixel or reducing a particular dimension. The workflow of chaining together different building blocks can easily be supported by a visual programming interface, and thus lowering the technical barrier for information extraction even more. This is demonstrated already in an operational setting by Sen2Cube.at, a nation-wide semantic data cube infrastructure for Austria, which uses our proposed semantic querying framework [2].

We believe our proposed framework is an important contribution to more widely accessible EO imagery. It lowers the barrier to extract valuable information from EO imagery for users that lack the advanced technical knowledge of EO data, but can benefit from the applications of it in their specific domain. They can now formulate queries by directly referencing real-world semantic concepts, without having to formalize how they are represented by the EO data. To execute the queries, they can use pre-defined mappings, which are application-independent and shareable. The framework eases interoperability of EO data analysis workflows also for expert users. Mappings can easily be shared and updated, and the queries themselves are robust against changes in the image domain.

I am a researcher at the University of Salzburg and geospatial developer at triply. I enjoy working at the interplay between theory and practice. My passion is to improve the quality of our living environment by supporting design choices with spatio-temporal data science.