To see our schedule with full functionality, like timezone conversion and personal scheduling, please enable JavaScript and go here.
10:00
10:00
30min
Coffee
Omicum
10:30
10:30
30min
Threats related to open geospatial data in the current geopolitical environment
Jussi Nikander, Henrikki Tenkanen

Finland has been a strong proponent for open data for a long time. Since 2010, a significant amount of public sector data has been published openly, and much of this data is geospatial by nature. Accurate geospatial data with nation-wide coverage is highly valuable for many applications, including matters related to national security and military applications. When such information is provided as open data, it can also be used by other countries, including hostile nations. Furthermore, geospatial data can also be used by criminals and other malicious actors, and therefore there have always been possible threats related to open geospatial data.
Traditionally, threats related to open geospatial data have been divided into two categories: threats to privacy and threats to national security. Threats to privacy have typically been handled carefully, as there are numerous datasets that pose obvious threats to privacy, such as accurate census data. Therefore, the public sector has developed mature best practices on how to handle privacy concerns, and there are also international guidelines to assess risks related to open data (Open Data Institute, 2022). For example, census or population registry data should never be published at an individual level, but the data should be aggregated to minimize the privacy risks.
After the Balkan wars of the 90s, the majority of Europe has been in a state of deep peace. Therefore, the potential national security threats related to open geospatial data have been given relatively little attention. Potential threats from other nation states have been sidelined by other concerns, and often dismissed as irrelevant due to increased European integration. This is true even in Finland, which never downsized her army or dismantled the national preparedness organizations. The Russian invasion of Ukraine caused a rapid and radical change in the global geopolitical environment. In Finland this caused a radical shift in discussion about national security.
Here, we report the results of a work, where the security concerns related to open geospatial data in Finland were studied. The main research questions for this work are:
What kinds of threats related to open geospatial data exist?
How can the threat-related open geospatial data be mitigated and managed?
Before our project, open discussions regarding the need for threat assessment in the new geopolitical environment had already started within the Finnish geospatial ecosystem. This gave a useful basis for scoping our research, as well as provided an environment where the findings could be discussed.
In the study, we focused specifically on matters related to national security. Specifically, our focus was on national geospatial datasets maintained by the National Land Survey of Finland (NLS), including e.g. the Finnish topographic database. Even though our focus was on the data produced by NLS, our findings are applicable more generally, as our approach considered potential threats enabled by open geospatial data in general.
As a main research method for the study, we used semi-structured interviews. We interviewed approximately 20 individuals from 13 Finnish organizations.The majority of the interviewees were from public sector organizations. During the last few interviews, there were not many new insights to be gained. Thus, we concluded that we had reached the saturation point in terms of new information and no further interviews were needed.
Based on the interviews, we created a number of threat scenarios. The threat scenarios were used as examples on what sorts of threats might be related to open geospatial data. The scenarios were then discussed and further refined with a number of experts on national security and the Finnish geospatial ecosystem.
In our results, we assigned the threats into categories, and gave recommendations for mitigation strategies related to open geospatial data. The results of the work are closely related to earlier threat assessment work done on a national level. Our results include several insights about how open geospatial data could be used to threaten critical infrastructure, important infrastructure, soft targets, as well as the privacy of individuals. Similarly, our results list potential sources of threats including other nation states, terrorist organizations and lone wolf terrorists, criminals, and foreign companies. Both the targets and the threats are well-known already in national security work and are not unique to the geospatial ecosystem.
In most of the threat scenarios discussed, open geospatial data could help malicious actors to plan and execute activities that can cause harm. Based on our analysis, the threat related to a specific dataset most often did not directly target the publisher of the dataset, nor affected the dataset itself. For example, detailed building data can be used to plan burglaries, and accurate road network and topographical data can be used to plan an armed invasion. Thus the targets of the malicious activity are elsewhere, and the data is used as means to gain more information about these targets.
To balance the potential unwanted use scenarios, the benefits of open geospatial data were also discussed throughout our interviews. When considering the threats and mitigation strategies, it is crucial to remember the benefits of open data. Just because it is possible to misuse a dataset is not alone a reason to try and limit the use of the data. Only if the threats are significant enough compared to the benefits gained from open data, should limitations to the data be considered.
Our study brings an important new aspect to the narratives around open geospatial data, as there is not much open discussion or research related to the potential threats caused by spatial data, or the relationship between open data and potential threats. Furthermore, our study reveals that there is an urgent need for further developing the guidelines (such as the one by Open Data Institute (2022)) and risk assessment frameworks that would better consider the threats and risks related to opening and sharing geospatial data from the perspective of national security.
References
Open Data Institute. (2022). Assessing risk when sharing data: A guide (p. 21). Open Data Institute. https://www.theodi.org/article/assessing-risk-when-sharing-data-a-guide/

Omicum
11:00
11:00
30min
Pan-European open building footprints: analysis and comparison in selected countries
Marco Minghini

Building footprints (hereinafter buildings) represent key geospatial datasets for several applications, including city planning, demographic analyses, modelling energy production and consumption, disaster preparedness and response, and digital twins. Traditionally, buildings are produced by governmental organisations as part of their cartographic databases, with coverage ranging from local to national and licensing conditions being heterogeneous and not always open. This makes it challenging to derive open building datasets with a continental or global scale. Over the last decade, however, the unparalleled developments in the resolution of satellite imagery, artificial intelligence techniques and citizen engagement in geospatial data collection have enabled the birth of several building datasets available at least at a continental scale under open licenses.
In this work, we analyse four such open building datasets. The first is the building dataset extracted from the well-known OpenStreetMap (OSM, https://www.openstreetmap.org) crowdsourcing project, which creates and maintains a database of the whole world released under the Open Database License (ODbL). OSM buildings are typically derived from the digitalisation of high-resolution satellite imagery, and in some case from the import of other databases with ODbL-compatible licenses. The second dataset is EUBUCCO (https://eubucco.com), a pan-European building database produced by a research team at the Technical University Berlin by merging different input sources: governmental datasets when available and open, and OSM otherwise [1]. EUBUCCO is mostly licensed under the ODbL, with only exceptions for two regions in Italy and Czech Republic. The third dataset is Microsoft Open Building Footprints (MS, https://github.com/microsoft/GlobalMLBuildingFootprints), extracted through the application of machine learning technology from high-resolution Bing Maps satellite imagery between 2014 and 2023, available at the global scale and also licensed under the ODbL. The fourth dataset, called Digital Building Stock Model (DBSM), was produced by the Joint Research Centre (JRC) of the European Commission to support studies on energy-related purposes. It is an ODbL-licensed pan-European dataset produced from the hierarchical conflation of three input datasets: OSM, MS and the European Settlement Map [2].
The objective of this work is to compare the four datasets – which derive from different approaches following heterogeneous processing steps and governance rules – in terms of their geometry (i.e. attributes are out of scope) in order to draw conclusions on their similarity and differences. It is known from literature that building completeness in OSM (which plays a key role in three out of the four datasets – OSM itself, EUBUCCO and DBSM) varies with the degree of urbanisation [3] and that machine learning applied to satellite imagery (used in MS) may have different performance depending on the urban or rural context [4]. In light of this, we analyse the building datasets according to the degree of urbanisation of their location using the administrative boundaries provided by Eurostat, which classifies each European province as urban, semi-urban or rural (https://ec.europa.eu/eurostat/web/gisco/geodata/reference-data/administrative-units-statistical-units/countries).
We chose five European Union (EU) countries for the analysis: Malta (MT), Greece (EL), Belgium (BE), Denmark (DK) and Sweden (SE). The choice was motivated by the needs to: i) select countries of different size and geographical location, which ensure that their national OSM communities are substantially different; ii) select countries having different portions of urban, semi-urban and rural areas; and iii) select two sets of countries for which the input source for EUBUCCO buildings was a governmental dataset (BE, DK) and OSM (MT, EL, SE) to detect possibly different behaviours.
From the methodological point of view, for each country and degree of urbanisation we first calculated and compared the total number and total area of buildings in all datasets and we examined their statistics through box plots. This was followed by the calculation, for each couple of datasets and degree of urbanisation, of the building area of intersection and its fraction of the total building area of each of the two datasets. Finally, we intersected all the four datasets and calculated the fraction of the area of each dataset that this intersection represents.
Results show that in urban areas, while the datasets are overall similar in terms of total area of buildings, the total number of buildings is typically higher in EUBUCCO for DK and BE, where the information comes from governmental datasets. This suggests that such datasets outperform OSM in modelling the footprints of individual buildings in the most urbanised areas. In contrast, in semi-urban and rural areas, where OSM traditionally lacks completeness, MS (and as a consequence DBSM, which is also based on MS) captures more buildings. This is especially evident in SE, where 94% of the country area is not urban. When calculating the intersection between building areas for each couple of datasets in all countries and urban areas, the area of OSM buildings scores the lowest percentages of intersection when compared to the building areas of the other datasets. The lowest such percentages, equal to 25%, are scored when compared to MS in non-urban areas. EUBUCCO represents an obvious exception for the countries (MT, EL and SE) where it uses OSM. Finally, the dataset for which the area of intersection between the buildings of all the four datasets represents the largest percentage of the area is OSM, with values even higher than 80% for urban areas. This proves that EUBUCCO and even more DBSM can be considered a sort of ‘OSM extension’ improving its completeness. Instead, the lowest values are scored by MS and result from its radically different generation process compared to the other datasets.
The whole procedure was written in Python using libraries such as Pandas, Dask-GeoPandas and Plotly. The code is available under the European Union Public License (EUPL) v1.2 at https://github.com/eurogeoss/building-datasets in the form of Jupyter Notebooks. Work is ongoing to extend the analysis to the whole EU in order to validate the results of this study and formulate recommendations at the continental level.

Omicum
11:30
11:30
30min
An open early-warining system prototype to help in management and study algal blooms on Lake Lugano
Daniele Strigaro

The effects of climate change, together with human activities, are stressing many natural resources. Such effects are altering distribution patterns, such as precipitation, and known dynamics in all natural spheres (Hydrosphere, Biosphere, Lithosphere, and Atmosphere). The monitoring of environmental parameters is becoming of primary importance to better understand the changes that we need to address. Satellite images, laboratory analysis of samples, and high-end real-time monitoring systems offer solutions to this problem. However, often such solutions require proprietary tools to better exploit data and interact with them. The open science paradigm fosters accessibility to data, scientific results, and tools at all levels of society. Hence, in this project, we aimed to apply such an approach to aid in managing a new phenomenon affecting Lake Lugano, primarily caused by the increase in water temperatures and the high load of nutrients from human activities. In fact, over the past years and particularly in 2023, distributed Harmful Algal Blooms (HABs) appeared on the lake, raising awareness of this phenomenon that can be dangerous for human and animal health. Since HABs are distributed on the water lake surface, an open source cost-effective solution based on open hardware, software and standards can potentially increase the spatial resolution to collect more dense measurements. The excessive algae growth could be composed by Cyanobacteria which can produce a wide range of toxic metabolities, including microcystins (MCs). These cyanotoxins, whose negative effect can be both acute at high concentrations and at low doses (Chen et al., 2009; Li et al., 2011), are produced by common species in Lake Lugano. Among these, the most problematic is Microcystis, as it can give rise to blooms during the summer period that accumulate along the shores due to wind and currents. In these areas, the risk of exposure to people and animals is higher, especially in bathing areas. Considering the potential risks to human and animal health, in this project an open early warning monitoring system has been designed and built upon previous experiences in water lake monitoring (Strigaro et al., 2022) by leveraging the benefits derived from the application of open science principles.
Most monitoring plans use microscopic counts of cyanobacteria as an indicator of toxicity risk. However, these analyses are time-consuming, therefore, in addition to or as an alternative to classical methods, sensors capable of measuring algal pigments are increasingly being used. In particular, phycocyanin (PC), characteristic of cyanobacteria, can be used as an indicator of cyanobacterial biomass, thus estimating the potential exceedance of critical levels of microcystins. Based on previous studies, this project aimed to develop a high-frequency sensor-based early warning system for real-time detection of phycocyanin in surface waters for bathing use. In particular, the study aimed to i) develop a pilot system for real-time phycocyanin surveillance, using a high-frequency fluorimeter positioned below the surface near a bathing beach; ii) develop a data management software that automatically notifies the exceeding of predicted phycocyanin risk thresholds; iii) test the system during cyanobacterial blooms, comparing the measured phycocyanin values with microcystin concentrations.
The hardware solution consists of a Raspberry Pi connected to a Trilux fluorimeter by Chelsea Technologies, which allows the measurement of three algal pigments (Chlorophyll-a, Phycocyanin, and Phycoerythrin), along with a module for transmitting data using NB-IoT. On the node, leveraging the concept of edge computing, the istSOS software has been installed. istSOS is an open-source Python implementation of the Sensor Observation Service of the Open Geospatial Consortium, fostering data sharing and interoperability. Raw data are retrieved from the sensor every minute and then stored in the local instance of istSOS. Simultaneously, a simple on-the-fly quality control is activated to flag each value with a quality index. The data are then aggregated every 10 minutes and transmitted every 15 minutes to the data warehouse. On the server side, another instance of istSOS is hosted to provide data for reports, post-processing validation, and the early warning system. Additionally, the open-source software Grafana has been explored to set up alerts based on three different thresholds. Each threshold has been developed including a hypothetical bathing water management plan, and they are expressed as follows:

1. Monitoring - PC threshold of 3.4 Chl-a eq µg/L, corresponding to a value of 5 μg/L of MCs (with PC greater than Chl-a). This threshold defines abundant phytoplankton growth with dominance of cyanobacteria. Upon exceeding this threshold, frequent monitoring of the situation and identification of the dominant genus is recommended to predict its potential toxicity.

2. Alert - PC threshold of 6.7 Chl-a eq µg/L, corresponding to a value of 10 μg/L of MCs. This threshold defines abundant cyanobacterial growth and the potential onset of a bloom. Upon exceeding this threshold, site inspection, identification of the dominant genus, and cyanotoxin analysis are recommended.

3. Prohibition - PC threshold of 13.4 Chl-a eq µg/L, corresponding to a value of 20 μg/L of MCs. This threshold defines an ongoing cyanobacterial bloom. Upon exceeding this threshold, the toxic risk is at its maximum, as we are approaching the maximum limits imposed for the bathing prohibition. Therefore, temporary bathing prohibition is recommended until confirmation of bloom toxicity with verification of any exceeding of the World Health Organization limit of 25 μg/L of MCs.

The adoption of open hardware, software, and standards allows the implementation of a toolchain that can be easily replicated. The promising results and openness of the solution will permit further expansion of the network to help decision makers and researcher to better manage and study this phenomena using sensor data. The solution can also effectively increase citizen awareness by implementing kits that local stakeholders can use to monitor the status of the lake water, providing additional data.

Omicum
12:00
12:00
5min
A standardised approach for serving environmental monitoring data compliant with OGC APIs
Juan Pablo Duque Ordoñez

Environmental monitoring is fundamental for addressing climate change. Environmental data, in particular air quality and meteorological parameters, are widely used for risk assessment, urban planning, and other studies regarding urban and rural environments. Finding open and good quality environmental data is a complex task, even though environmental and meteorological monitoring are considered some of INSPIRE's high value datasets. For this reason, having robust, open, and standardised services that can offer spatial data is of critical importance.

A good example of open, high-quality, environmental and meteorological data is one of the Regional Agencies for Environmental Protection, ARPA Lombardia. This agency maintains the air quality and meteorological monitoring station networks of the region and serves a high volume of sensor observations. The Lombardy region is located in northern Italy and is considered its financial and industrial muscle. Due to its topology, during the colder months of the year, the pollution levels of the region increase, in particular the concentrations of particulate matter (PM10 and PM2.5), as portrayed in [1]. For this reason, having a well-established monitoring network is critical. The ARPA Lombardia monitoring network generates huge volumes of data, which is served through its catalogue and a set of services. It is possible to download air quality and meteorological observations, as well as the information of the monitoring stations. These data have been extensively used in research, in particular, in the study of air quality in the region [2][3].

ARPA Lombardia environmental monitoring data is served through the API (Application Programming Interface) of the Lombardy region, Open Data Lombardia. Although this service is highly functional, thoroughly documented and works correctly, we identified some limitations that could pose problems for researchers, especially in the field of geospatial information. This service has geospatial capabilities, such as the possibility to download data in GEOJSON format, however, it is not compliant with other open standards such as WFS, WMS, or OGC APIs, posing a problem of interoperability with other geoportals and catalogues that do follow these standards. Additionally, column names of the meteorological and air quality observations and meteorological stations datasets are not homogenised, making them not fully interoperable. Finally, the ARPA Lombardia services and data fields are only available in Italian, which also poses interoperability concerns.

Highlighting the societal, environmental, and economic importance of this kind of information, in this work we present and document the implementation of a web API compliant with OGC API specifications for exposing the air quality and meteorological information from ARPA Lombardia. The data provided by ARPA Lombardia is shared under the licence CC0 1.0 Universal, meaning it is public domain.

The developed API serves environmental monitoring data (both air quality and meteorological) in compliance with a set of OGC APIs. This API is capable of exposing data in different standardised formats, filtering by multiple fields and locations, and performing server-side processing of the observations. OGC APIs are modern standards for geospatial information. Although they are still in the adoption phase, many reference implementations are being developed, and governmental institutions are starting to adopt such standards [4][5]. They differ from widespread, older OGC standards such as WMS or WFS as they are based in JSON and OpenAPI, while older standards are based in XML. By implementing new OGC API applications we contribute to the spread of these standards in academic environments and their overall development.

In particular, we developed a web service compliant with OGC API - Features for exposing the stations’ information and locations as vector data, OGC API - Environmental Data Retrieval (EDR) for serving observations from environmental and meteorological stations, and OGC API - Processes to allow researchers to perform server-side processing to the underlying data, such as data cleansing, interpolations (e.g., conversion to a coverage format, obtaining data at an arbitrary point, etc.), and data aggregation (e.g., by day/month/year, by station). The API also complies with OpenAPI standards, HTTP content negotiation, and homogenised column names in English, to improve the usability of ARPA data by foreign researchers.

This work is not intended to replace ARPA Lombardia API, but to provide an alternative for accessing the data and extend even further the possibilities of researchers with additional processing capabilities. Additionally, to further improve the ecosystem of OGC API implementations available and push forward those open standards in academic literature. The full paper will provide a system architectural description and the particular technologies used to develop the application, a comparison with ARPA Lombardia's current API, and a case study portraying the API capabilities for research. This work is of interest to the FOSS4G community and European regional agencies as it is an implementation of a promising open standard for environmental monitoring and sensor networks, as it is the OGC API - EDR, and as an example of the infrastructure and the capabilities that services for environmental monitoring should have.

References:

[1] Maranzano, P. (2022). Air Quality in Lombardy, Italy: An Overview of the Environmental Monitoring System of ARPA Lombardia. Earth 2022, Vol. 3, Pages 172-203, 3(1), 172–203. https://doi.org/10.3390/EARTH3010013

[2] Gianquintieri, L., Oxoli, D., Caiani, E. G., & Brovelli, M. A. (2024). Implementation of a GEOAI model to assess the impact of agricultural land on the spatial distribution of PM2.5 concentration. Chemosphere, 352, 141438. https://doi.org/10.1016/J.CHEMOSPHERE.2024.141438

[3] Cedeno Jimenez, J. R., Pugliese Viloria, A. de J., & Brovelli, M. A. (2023). Estimating Daily NO2 Ground Level Concentrations Using Sentinel-5P and Ground Sensor Meteorological Measurements. ISPRS International Journal of Geo-Information, 12(3). https://doi.org/10.3390/IJGI12030107

[4] MSC GeoMet - GeoMet-OGC-API - Home. (n.d.). Retrieved February 23, 2024, from https://api.weather.gc.ca/

[5] API for downloading geographic objects (API-Features) of the National Geographic Institute. (n.d.). Retrieved February 23, 2024, from https://api-features.ign.es/

Omicum
12:05
12:05
5min
Modernizing Geospatial Services: An investigation into modern OGC API implementation and comparative analysis with traditional standards in a Web application
Sudipta Chowdhury

The Open Geospatial Consortium (OGC) APIs are a new set of standards released in response to existing WxS standards which is considered as a modern technology for data sharing over the internet. This study explores the transition from traditional geospatial service standards to modern Open Geospatial Consortium (OGC) API standards in web applications by implementing it in the field of urban development management. The main goal of this study is to explore the potential for enhancing web applications through a comparative analysis of the integration of modern and traditional geospatial technologies based on their performance and practical implications.
The research scope encompasses the design and development of a modern web application architecture, involving database design and preparation, and automatic integration of data from various format; implementation of geospatial services using both traditional standards and modern OGC API standards, including the creation of a frontend website using Openlayers for the user. However, the core focus was given on the comparative analysis of the traditional and modern geospatial services standards, evaluating data compatibility, deployment processes, and performance metrics with different levels of concurrent requests.
The study is structured into two primary segments: an extensive theoretical evaluation of the standards, and followed by a hands-on testing phase. involving the setup of both traditional and modern services separately while keeping the other components (database and frontend) same in the architecture. In the database tier, PostGIS was employed, Geoserver and Pygeoapi were used in the server section for publishing data in both traditional (WxS) and modern (OGC API) standards to the user tier. OpenLayers was used for the frontend to visualize the data for users.
Database design and preparation were accomplished using Geodjango and PostgreSQL, and automatic data integration was conducted using Python. The ALKIS (Authoritative Real Estate Cadastre Information System of Germany) includes both spatial and non-spatial information encoded in NAS (i.e., the standards-based exchange interface defined by the Surveying Authorities of Germany) format using Extensible Markup Language (XML), served as the primary data source in this study with essential details such as street names, house numbers, and land parcel id. The comparison (Geoserver and Pygeoapi) platforms considered key findings, lessons learned, data format compatibility, and the evaluation of the installation process through literature review. Performance metrics were measured through hands-on testing in terms of rendering time, overall performance of the website for different zoom level for different scale of vector features. Testing also included different data source formats such as PostGIS, GeoPackage (gpkg), and Shapefile (shp), with a focus on how performance varied with the change of the data source in the front end. Apache Jmeter and Google Chrome developer tools like network and lighthouse were used to get the rendering data from the front end. Usability evaluations are currently underway to gain user perspectives on aspects like data retrieval speed, map rendering speed, and the ease of use (e.g., panning, zooming, popups) in comparison to the previous system.
In a theoretical comparison Geoserver, a well-established and widely adopted open-source platform with an organized Graphical User Interface (GUI), boasts robust security features with support for various authentication methods and precise access control. With a rich history and a large user community, Geoserver provides extensive documentation and support resources. It supports a diverse array of data stores, including popular databases and file-based formats. On the other hand, Pygeoapi, a newer but increasingly popular project, emphasizes simplicity and ease of use. Offering modern technologies like the OpenAPI standard for a RESTful API, Pygeoapi supports various data stores, including PostgreSQL/PostGIS and Elasticsearch. Installation is straightforward, leveraging Python and its dependencies. While Geoserver stands out for its comprehensive feature set, including support for OGC standards and numerous plugins, Pygeoapi focuses on being lightweight and customizable according to OGC API standards.
Based on the extensive hands-on testing, the analysis reveals persistent trends in rendering times across different scenarios. Pygeoapi consistently demonstrates higher rendering times compared to both Geoserver (WFS) and Geoserver (WMS). The fluctuation in rendering times remains relatively uniform as the zoom level increases from 14 to 18. However, as the number of features escalates from 4891 to 23319, both Pygeoapi (1.55s to 7.56s) and Geoserver WFS (454ms to 2.19s) exhibit a proportional increase in rendering time. Remarkably, Geoserver (WMS) showcases notable stability in rendering times across various zoom levels and feature counts, attributed to its tile-based approach. The observed linear correlation between feature count and rendering time suggests a scalability factor affecting both Pygeoapi and Geoserver. Consequently, users may need to consider factors beyond rendering times, such as ease of use, scalability, and available features, when making a choice between Pygeoapi and Geoserver for their specific spatial data needs. Moreover, concerning different data formats, it becomes apparent that PostGIS consistently outperforms SHP, JSON, WFS, and GPKG in Pygeoapi. In Geoserver, SHP and GPKG exhibit superior performance compared to other formats. These findings underscore the importance of considering the nuances of data formats when optimizing the performance of spatial data services. To overcome the issue of prolonged rendering times in Pygeoapi, especially when managing substantial amounts of GeoJSON data, a viable solution lies in incorporating vector tiles. The adoption of vector tiles led to a substantial reduction in rendering times (from 5.6s to 898ms) by transmitting pre-styled and pre-rendered map data. This approach enhances efficiency in visualizing data on the client side, demonstrating a significant improvement in performance.
In conclusion, at the end this research will endeavour to provide actionable insights towards the effective integration of geospatial technologies, with the goal of narrowing the divide between well-established standards and emerging APIs within the dynamic realm of web applications.

Omicum
12:10
12:10
5min
The template for a Semantic SensorThings API with the GloSIS use case
Luís M. de Sousa

Motivation

Spatial Data Infrastructures (SDI) developed for the exchange of environmental
has heretofore been greatly shaped by the standards issued by the Open
Geospatial Consortium (OGC). Based on the Simple Object Access Protocol (SOAP),
services like WMS, WFS, WCS, CSW became digital staples for researchers and
administrative bodies alike.

In 2017 the Spatial Data on the Web Working Group (SDWWG) questioned the overall
approach of the OGC, based on the ageing SOAP technology
[@SDWWG2017]. The main issues identified by the SDWWG can be summarised as:

  • Spatial resources are not identified with URIs.
  • Modern API frameworks, e.g. OpenAPI, are not being used.
  • Spatial data are still shared in silos, without links to other resources.
  • Content indexing by search engines is not facilitated.
  • Catalogue services only provide access to metadata, not the data.
  • Data difficult to understand by non-domain-experts.

To address these issues the SDWWG proposed a five point strategy inspired on the
Five Star Scheme [@BernersLee2006]:

  • Linkable: use stable and discoverable global identifiers.
  • Parseable: use standardised data meta-models such as CSV, XML, RDF, or JSON.
  • Understandable: use well-known, well-documented, vocabularies/schemas.
  • Linked: link to other resources whenever possible.
  • Usable: label data resources with a licence.

The work of the SDWWG triggered a transformational shift at the OGC towards
specifications based on the OpenAPI. But while convenience of use has been the
focus, semantics has been largely unheeded. A Linked Data agenda has not
been pursued.

However, the OpenAPI opens the door to an informal coupling of OGC services with
the Semantic Web, considering the possibility of adopting JSON-LD as
syntax to OGC API responses. The introduction of a semantic layer to digital
environmental data shared through state-of-the-art OGC APIs is becoming a
reality, with great benefits to researchers using or sharing data.

This communication lays down a simple SDI set up to serve semantic environmental
data through a SensorThings API created with the glrc software. A use case is
presented with soil data services compliant with the GloSIS web ontology.

SensorThings API

SensorThings API is an OGC standard specifying a unified framework to
interconnect Internet of Things resources over the Web [@liang2016ogc].
SensorThings API aims to address both the semantic, as well as syntactic,
interoperability. It follows ReST principles [@fielding2002principled],
promotes data encoding with JSON, the OASIS OData protocol
[@chappell2011introducing] and URL conventions.

The SensorThings API is underpinned on a domain model aligned with the ISO/OGC
standard Observations & Measurements (O&M) [@Cox2011], targeted at the
interchange of observation data of natural phenomena. O&M puts forth the
concept of Observation has an action performed on a Feature of Interest
with the goal of measuring a certain Property through a specific Procedure.
SensorThings API mirrors these concepts with Observation, Thing,
ObservedProperty and Sensor. This character makes of SensorThings API a
vehicle for the interoperability of heterogeneous sources of environmental
data.

glrc

grlc (pronounced "garlic") is a lightweight server that translates SPARQL
queries into Linked Data web APIs [@merono2016grlc] compliant with the OpenAPI
specification. Its purpose is to enable universal access to Linked
Data sources through modern web-based mechanisms, dispensing the use of the
SPARQL query language. While losing the flexibility and federative capacities
of SPARQL, web APIs present developers with an approachable interface that can
be used for the automatic generation of source code.

A glrc API is constructed from a SPARQL query to which a meta-data section is
prepended. This section is declared with a simplified YAML syntax, within a
SPARQL comment block, so the query remains valid SPARQL. The meta-data provide
basic information for the API set up and most importantly, the SPARQL end-point
on which to apply the query. The listing below shows an example.

#+ endpoint: http://dbpedia.org/sparql

PREFIX dbo: <http://dbpedia.org/ontology/>
PREFIX dbr: <http://dbpedia.org/resource/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>

SELECT ?band_label { 
    ?band rdf:type dbo:Band ;
          dbo:genre dbr:Hard_Rock ;
          rdfs:label ?band_label .
} ORDER BY ?band_label

A special SPARQL variable formulation is used to map into API parameters. By
adding an underscore (_) between the question mark and the variable name,
glrc is instructed to create a new API parameter. A prefix separated again
with an underscore informs glrc of the parameter type. The ?band_label
variable can be expanded to ?_band_label_iri to create a
new API parameter of the type IRI.

Use case: GloSIS

The Global Soil Partnership (GSP) is a network of stakeholders in the soil
domain established by members of the United Nations Food and Agriculture
Organisation (FAO). Its broad goals are to raise awareness to the importance of
soils and to promote good practices in land management towards a sustainable
agriculture.

Acknowledging difficulties in exchanging harmonised soil data as an important
obstacle to its goals, the GSP launched in 2019 an international consultancy to
assess the state-of-the-art and propose a path towards a Global Soil Information
System (GloSIS) based on a unified exchange. A domain model resulted, based
on the ISO 28258 standard for soil quality [@SchleidtReznik2020], augmented with
code-lists compiled from the FAO Guidelines for Soil Description [@Jahn2006].
This domain model was then transformed to a Web Ontology, relying on the Sensor,
Observation, Sample, and Actuator ontology (SOSA) [@Janowicz2019], and other
Semantic Web standards such as GeoSPARQL, QUTD and SKOS. The GloSIS web ontology
has been successfully demonstrated as a vehicle to exchange soil information as
Linked Data [@GloSIS].

A prototype API for the GloSIS ontology, formulated in compliance with the
SensorThings API specification, will be presented in this communication. It
demonstrates how the same set of SPARQL queries can be used to query through a
ReST API any end-point available over the internet, sharing linked soil data in
accordance with the GloSIS ontology. Thus providing a clear step towards the
federated and harmonised system envisioned by the GSP.

Omicum
12:15
12:15
5min
Beautiful Thematic Maps in Leaflet with Automatic Data Classification
Dániel Balla

With the web being a platform that provides lots of features and a high degree of customizability for creating web maps, web-based thematic maps still require expertise to visualize geospatial data in a way that highlights spatial differences in an exact and cartographically comprehensive way. While most thematic maps show data with seven or less classes, as determined by (Linfang and Liqiu, 2014), the maker of a thematic map must choose a class count and classify quantitative data to properly convey their message through the map. Data classification methods all have advantages and disadvantages for specific spatial data types, therefore choosing the most optimal method is of great importance to minimize information loss (Osaragi, 2002). Choosing an optimal class count massively helps the map user to quickly comprehend thematic data and discover relevant spatial differences. With a plethora of visual variables, summarized by (Roth, 2017), there are many ways to distinguish classes of features in geovisualization. For styling features, mapping libraries provide tools to make use of only a few visual variables natively. A thematic map requires a specific symbology tailored to the given data, which distinguishes classes by altering one or more of these visual variables for their symbols. While its symbology needs to be legible and visually separated from the background map, it also needs to be created in a way that does not overload the map visually.

The popular open source web mapping framework Leaflet lacks a straightforward approach to create thematic maps with all basic principles that they should adhere to (data classification, automatic symbology and legend generation). In the paper, features and shortcomings of Leaflet in the context of thematic mapping are examined in detail. First, Leaflet lacks any kind of native data classification process that would be needed to create discrete classes of data for thematic maps. Therefore, using GIS software beforehand to classify and style the dataset properly (to get class boundaries and exact colours) is inevitable. Second, for symbology, although it makes use of modern web standards like HTML5 and CSS3 to style vector map features (Agafonkin, 2023), it still lacks styling solutions that are common in traditional thematic cartography (e.g., hatch fill patterns), as discussed in (Gede, 2022). As a thematic map requires some kind of explanation of visualized data, the presence of a descriptive, well-formed legend with exact symbols for all data classes is non-trivial either. Although various tutorials and workarounds are available, those only solve part of the principles. The examples provided by the official website of Leaflet are hard-coded and static, meaning that they will have to be recreated for each specific thematic map, making them unsuitable for implementation in a dynamic data visualization. Moreover, these workarounds are complex to accomplish, especially for those who are not familiar with programming to an extent to be able to code visually pleasing thematic maps on websites.

As a solution, this paper introduces a highly customizable, open source plugin for Leaflet, developed by the author, which extends Leaflet’s GeoJSON class and combines all processes required for creating a thematic map in a single step. By combining all the necessary processes, this easy-to-use extension is a solution that wraps the individual processes of quantitative data classification, symbology and creation of an appealing legend. The extension puts an emphasis on providing numerous options for a highly customizable visualization. It supports well-known data classification methods, custom and Brewer colour ramps as defined by (Brewer et al., 2003), symbol colour-, size- and hatch fill pattern-based distinctions, HTML legend row templating, legend class order, data manipulation options, and many other features. For maps with graduated symbol sizes, it generates widths between user-adjustable min-max sizes. For point features, the symbol shape can also be changed to predefined SVG shape symbols. Data manipulation options include normalization by a secondary attribute field in dataset, rounding generated class boundary values and visually modifying them by applying division or multiplication (to easily change unit of displayed value). In case the input GeoJSON dataset has features without data for the set attribute field (null/nodata), these features are handled to optionally form a separate class with a neutrally styled symbol. Should the map maker wish to ignore these nodata features, they can be ignored, therefore not showing up on the map as a distinguished class. As it is an extension of a native L.geoJSON layer, multiple instances of L.dataClassification layers can also still be used within a single Leaflet map object. This allows for more complex thematic maps with multiple layers or different kinds of data with a different symbology type at the same time (e.g., a combination of a choropleth background map, with graduated symbol sized points as a second layer in the foreground). Since a legend is automatically generated and displayed for each instance, they are linked to the respective data layer, therefore it inherits all methods that are called on the layer (e.g., if the map maker uses remove() to programmatically remove the layer for some reason, the related legend also reflects these changes). Even though the legend is created with a clear and concise style by default, legend styling can easily be customized with the provided options and CSS definitions.

As one of the goals, the plugin facilitates the easy creation of clean thematic maps using exclusively open source software and libraries, with the hope of increasing the availability, accessibility and popularity of such thematic mapping on the web. The extension is still under development, and is available on GitHub (with examples), at https://github.com/balladaniel/leaflet-dataclassification.

Omicum
12:30
12:30
90min
Lunch
Omicum
14:00
14:00
30min
XDGGS: A community-developed Xarray package to support planetary DGGS data cube computations
Alexander Kmoch

1. Introduction

Traditional maps use projections to represent geospatial data in a 2-dimensional plane. This is both very convenient and computationally efficient. However, this also introduces distortions in terms of area and angles, especially for global data sets (de Sousa et al., 2019). Several global grid system approaches like Equi7Grid or UTM aim to reduce the distortions by dividing the surface of the earth into many zones and using an optimized projection for each zone to minimize distortions. However, this introduces analysis discontinuities at the zone boundaries and makes it difficult to combine data sets of varying overlapping extents (Bauer-Marschallinger et al., 2014).

Discrete Global Grid Systems (DGGS) provide a new approach by introducing a hierarchy of global grids that tesselate the Earth’s surface evenly into equal-area grid cells around the globe at different spatial resolutions, and providing a unique indexing system (Sahr et al., 2004). DGGS are now defined in the joint ISO and OGC DGGS Abstract Specification Topic 21 (ISO 19170-1:2021). DGGS serve as spatial reference systems facilitating data cube construction, enabling integration and aggregation of multi-resolution data sources. Various tessellation schemes such as hexagons and triangles cater to different needs - equal area, optimal neighborhoods, congruent parent-child relationships, ease of use, or vector field representation in modeling flows.

Purss et al. (2019) have explained the idea to combine DGGS and data cubes and underlined the compatibility of these two concepts. Thus, DGGS are a promising way to harmonize, store, and analyse spatial data on a planetary scale. DGGSs are commonly used with tabular data, where the cell id is a column. Many datasets have other dimensions, such as time, vertical level, ensemble member, etc. For these, it was envisioned to be able to use Xarray (Hoyer and Hamman 2017), one of the core packages in the Pangeo ecosystem, as a container for DGGS data.

At the joint OSGeo and Pangeo code sprint at the ESA BiDS’23 conference (6.-9. November, 2023, Vienna), members from both communities came together and envisioned implementing support for DGGS in the popular Xarray Python package, which is at the core of many geospatial big data processing workflows. The result of the codesprint is a prototype Xarray extension, named xdggs (https://github.com/xarray-contrib/xdggs), which we describe in this article.

2. Design and methodology

There are several open-source libraries that make it possible to work with DGGS. Uber H3 , HEALPIX , rHEALPix , DGGRID , Google S2 , OpenEAGGR – many if not most have Python bindings (Kmoch et al. 2022). However, they often come with their very own not easy-to-use APIs, different assumptions, and functionalities. This makes it difficult for users to explore the wider possibilities that DGGS can offer.
The aim of xdggs is to provide a unified, high-level, and user-friendly API that simplifies working with various DGGS types and their respective backend libraries, seamlessly integrating with Xarray and the Pangeo open-source geospatial computing ecosystem. Executable notebooks demonstrating the use of the xdggs package are also developed to showcase its capabilities. The xdggs community contributors set out with a set of guidelines and common DGGS features that xdggs should provide or facilitate, to make DGGS semantics and operations possible to use via the user-friendly Xarray API of working with labelled arrays.

3. Results

This development represents a significant step forward. With xdggs, DGGS become more accessible and actionable for data users. Like traditional cartographic projections, a user does not need to be a expert on the peculiarities of various grids and libraries to work with DGGS, and can continue working in the well-known Xarray workflow. One of the aims of xdggs is making DGGS data access and conversion user-friendly, while dealing with the coordinates, tesselations, and projections under the hood.

DGGS-indexed data can be stored in an appropriate format like Zarr or (Geo)Parquet, with according metadata to understand which DGGS (and potentially under which specific configuration) is needed to address the grid cell indices correctly. An interactive tutorial on Pangeo-Forge as open-access resource is being developed as well to demonstrate to users how to effectively utilizing these storage formats, thereby facilitating knowledge transfer in data storage best practices within the geospatial open-source community.

Nevertheless, continuous efforts are necessary to broaden the accessibility of DGGS for scientific and operational applications, especially in handling gridded data such as global climate and ocean modeling, satellite imagery, raster data, and maps. This would require, for example, an agreement ideally with entities such as the OGC for DGGS reference systems’ registry (similar to the epsg/crs/proj database).

4. Discussion and outlook

One of the big advantages of DGGS use via Xarray is the data integration between multi-source multi-sensor EO data, large global-scale ocean and climate models using the Pangeo environment and to make the data access and development practical and FAIR (Findable, Accessible, Interoperable, Reproducible) in the community. Two additional directions to improve uptake and comprise knowledge transfer could include:

1) The implementation of DGGS such as HEALPix, DGGRID-based equal-area DGGS (ISEA), rHEALPix, and (currently) more industry-friendly DGGS (Uber H3, Google S2) on Xarray should be improved further, and more user-friendly API for how to re-grid current data into DGGS grids. Training materials and Pangeo sessions should be conducted to demonstrate the use of DGGS in Xarray, aimed at enhancing the skillset of practitioners and researchers in geospatial data handling, spatial data analysis, and professional and academic institutions.

2) DGGS-indexed reference datasets could be validated and also used to highlight case studies and instructional material can be used in academic courses and workshops, focusing on the practical applications of data fusion, quick addressing of equal-area cell grids, AI, socio-economic and environmental studies. Especially the emerging property of selecting cell-ranges from different data sources to join and integrate only based on cell ids could make partial data access and sharing more dynamic and easy.

Omicum
14:30
14:30
30min
SpectralIndices.jl: Streamlining spectral indices access and computation for Earth system research
Francesco Martinuzzi

Remote sensing has evolved into a fundamental tool in environmental science, helping scientists monitor environmental changes, assess vegetation health, and manage natural resources. As Earth observation (EO) data products have become increasingly available, a large number of spectral indices have been developed to highlight specific surface features and phenomena observed across diverse application domains, including vegetation, water, urban areas, and snow cover. Examples of such indices include the normalized difference vegetation index (NDVI) (Rouse et al., 1974), used to assess vegetation states, and the normalized difference water index (NDWI) (McFeeters, 1996), used to delineate and monitor water bodies. The constantly increasing number of spectral indices, driven by factors such as the enhancement of existing indices, parameters optimization, and the introduction of new satellite missions with novel spectral bands, has necessitated the development of comprehensive catalogs. One such effort is the Awesome Spectral Indices (ASI) suite (Montero et al., 2023), which provides a curated machine-readable catalog of spectral indices for multiple application domains. Additionally, the ASI suite includes not only a Python library for querying and computing these indices but also an interface for the Google Earth Engine JavaScript application programming interface, thereby accommodating a wide range of users and applications.

Despite these valuable resources, there is an emerging necessity for a dedicated library tailored to Julia, a programming language renowned for its high-performance computing capabilities (Bezanson et al., 2017). Julia has not only established itself as an effective tool for numerical and computational tasks but also offers the possibility to utilize Python within its environment through interoperability features. This interoperation adds a layer of flexibility, allowing users to access Python's extensive libraries and frameworks directly from Julia. However, while multiple packages are available in Julia to manipulate high dimensional EO data, most of them provide different interfaces. Furthermore, leveraging Python's PyCall for interfacing with Zarr files and other high-dimensional data formats is not practical. Specifically, the inefficiency in cross-language data exchange and the overhead from cross-language calls significantly hinder performance, underlining the need for native Julia solutions optimized for such data tasks.

Recognizing the need for a streamlined approach to use spectral indices, we introduce SpectralIndices.jl, a Julia package developed to simplify the computation of spectral indices in remote sensing applications. SpectralIndices.jl provides a user-friendly, efficient solution for both beginners and researchers in the field of remote sensing. SpectralIndices.jl offers several features supporting remote sensing tasks:
- Easy Access to Spectral Indices: The package provides instant access to a comprehensive range of spectral indices from the ASI catalog, removing the need for manual searches or custom implementations. Users can effortlessly select and compute indices suitable for their specific research needs.
- High-Performance Computing: Built on Julia's strengths in numerical computation, SpectralIndices.jl provides rapid processing even for large datasets (Bouchet-Valat et al., 2023). Consequently, this makes it a time-efficient tool for handling extensive remote sensing data.
- Versatile Data Compatibility: SpectralIndices.jl supports a growing list of input data types. Furthermore, the addition of data types to the library does not slow down compilation through the built-in package extensions of Julia that allow conditional compilation of dependencies.
- User-Friendly Interface: Designed with simplicity in mind, the package enables users to compute spectral indices with just a few lines of code. This ease of use lowers the barrier to entry for those new to programming or remote sensing.
- Customization and Community Contribution: Users can extend the package's capabilities by adding new indices or modifying existing ones. This openness aligns with the FAIR principles, ensuring that data is findable, accessible, interoperable and reusable.

By providing a straightforward and efficient means to compute spectral indices, the package helps users to streamline and accelerate software pipelines in Earth system research. Furthermore, it provides a consistent and unified interface to compute indices, improving the reliability and accuracy of research outcomes. Whether tracking deforestation, studying crop health, or assessing water quality, SpectralIndices.jl equips users with the tools needed for accurate, timely analysis.

The introduction of SpectralIndices.jl reflects a broader trend in scientific computing towards adopting high-performance languages like Julia, highlighting the importance of efficient data analysis tools in addressing complex environmental challenges. This development contributes to the democratization of data analysis, making advanced tools more accessible to a diverse range of users.

The SpectralIndices.jl package is open-source and hosted on GitHub (https://github.com/awesome-spectral-indices/SpectralIndices.jl), available for public access and contribution. It is licensed under the MIT license, which permits free use, modification, and distribution of the software. This approach encourages community contributions and fosters an environment of shared learning and improvement, ensuring that SpectralIndices.jl remains a cutting-edge tool for environmental analysis and research. Additionally, the code is commented and documented, facilitating both contribution and adoption. The code in the examples is run during the compilation of the online documentation, assuring its reproducibility. Finally, the software is tested using continuous integration through GIthub Actions, ensuring its correct execution in different use cases and environments.

Omicum
15:00
15:00
30min
Facilitating advanced Sentinel-2 analysis through a simplified computation of Nadir BRDF Adjusted Reflectance
David Montero Loaiza

The Sentinel-2 mission, pivotal to the European Space Agency's Copernicus program, features two satellites with the MultiSpectral Instrument (MSI) for high-to-medium resolution (10-60 m) imaging in visible (VIS), near-infrared (NIR), and shortwave infrared (SWIR) bands. Its 180° satellite phasing allows for a 5-day revisit time at the equator, essential for Earth Observation (EO) tasks. Sentinel-2 Surface Reflectance (SR) is crucial in detailed Earth surface analysis. However, for enhanced accuracy in SR data, it is imperative to perform adjustments that simulate a nadir viewing perspective (Roy et al., 2016). This correction mitigates the directional effects caused by the anisotropy of SR and the variability in sunlight and satellite viewing angles. Such adjustments are essential for the consistent comparison of images captured at different times and under varying conditions. This is particularly critical for processing and analysing Earth System Data Cubes (ESDCs, Mahecha et al., 2020), which are increasingly used due to their organised spatiotemporal structure and the ease of their generation from cloud-stored data (Montero et al., 2023).

The MODIS BRDF/Albedo product presents spectral Bidirectional Reflectance Distribution Function (BRDF) model parameters, enabling the calculation of directional reflectance across any specified sensor viewing and solar angles. Building on this foundation, Roy et al. (2008, 2016) introduced a novel approach leveraging MODIS BRDF parameters, named the c-factor, for the adjustment of Landsat SR data. This adjustment produces Nadir BRDF Adjusted Reflectance (NBAR) by multiplying the observed Landsat SR with the ratio of reflectances predicted by the MODIS BRDF model for both the observed Landsat SR and a standard nadir view under fixed solar zenith conditions. Subsequently, Roy et al. (2017) expanded this method to include adjustments for multiple Sentinel-2 spectral bands (VIS to SWIR).

While the c-factor method facilitates straightforward computation for individual Sentinel-2 images, there is a notable absence of a unified Python framework to apply this conversion uniformly across multiple images, especially for ESDCs derived from cloud-stored data.

To bridge this gap, we introduce “sen2nbar,” a Python package specifically developed to convert Sentinel-2 SR data to NBAR. This tool is versatile, aiming for converting both individual images and ESDCs generated from cloud-stored data, thus streamlining the conversion process for Sentinel-2 data users.

The "sen2nbar" package, meticulously designed for simplicity, facilitates the direct conversion of Sentinel-2 Level 2A (L2A) SR data to NBAR through a single function. To streamline this process, the package is segmented into multiple modules, each dedicated to specific tasks within the NBAR computation pipeline. These modules include functions for extracting sun and sensor viewing angles from metadata, calculating geometric and volumetric kernels, computing the BRDF model, and determining the c-factor.

“sen2nbar” supports NBAR calculations for three distinct data structures:

  1. Complete scenes via SAFE files: Users can input a local SAFE file from a Sentinel-2 L2A scene. The package processes this file, generating a new folder where each spectral band is adjusted to NBAR at its original resolution. The adjusted images are saved as Cloud Optimised GeoTIFF (COG) files, with an option for users to choose standard GeoTIFF formats instead.

  2. Xarray Data Arrays via “stackstac”: For ESDCs obtained as xarray data array objects from a SpatioTemporal Asset Catalog (STAC) using stackstac and pystac-client, “sen2nbar” requires the xarray object, the STAC endpoint, and the Sentinel-2 L2A collection name. This information allows the package to access STAC for metadata retrieval necessary for adjusting the data cube. The spatial coverage and resolution in this scenario might differ from complete scenes, and "sen2nbar" adjusts only the specific area and timeframe retrieved for the given resolution.

  3. Xarray Data Arrays via “cubo”: When users have ESDCs formed as xarray data arrays through cubo, which builds upon stackstac and incorporates the STAC endpoint and the collection name as attributes, “sen2nbar” directly adjusts these to NBAR, utilising the methodology described in the stackstac case.

For the latter two scenarios, “sen2nbar” works without writing files to disk, instead returning an xarray data array object containing the NBAR values. The package is designed to handle available bands without errors for missing bands, acknowledging that users may not require all bands and might have generated ESDCs with selected bands. Additionally, if the input arrays are ‘lazy’ arrays, created using dask arrays (a default in stackstac or cubo), “sen2nbar” executes calculations in parallel, ensuring efficient computation of NBAR values.

Importantly, “sen2nbar” automatically harmonises SR data for images with a processing baseline of 04.00 or higher before performing NBAR, ensuring consistency and accuracy in the processed data.

"sen2nbar" efficiently computes NBAR values from Sentinel-2 L2A SR data. The software supports complete SAFE files processing as well as the adjustment of ESDCs sourced from STAC and COG files, utilising tools such as “stackstac” and “cubo”. This versatility is encapsulated in a streamlined design, allowing for the adjustment of various data formats through a single, user-friendly tool, adapted to diverse user requirements.

"sen2nbar" is anticipated to become a key resource for geospatial Python users, especially in Earth System research. This tool is set to improve analyses conducted by scientists and students by significantly reducing the time and effort traditionally spent on technical adjustments. Its impact is expected to be particularly profound for multitemporal analyses, facilitating more efficient and streamlined investigations. This includes Artificial Intelligence (AI) research, particularly for studies involving multidimensional EO data. By utilising "sen2nbar", AI-based research can achieve more reliable outcomes, enhancing the overall quality and credibility of the findings.

The “sen2nbar” package is open-source and readily available on GitHub (https://github.com/ESDS-Leipzig/sen2nbar) under an MIT License. This encourages contributions from the global community, fostering collaborative development and continuous improvement. While prior experience in Remote Sensing can be advantageous for users, it is not a prerequisite for using it. The package is equipped with comprehensive documentation and tutorials, all designed to be beginner-friendly and facilitate easy adoption of the package.

Omicum
15:30
15:30
30min
Lunch
Omicum
16:00
16:00
30min
Mapping Soil Erosion Classes using Remote Sensing Data and Ensemble Models
Ayomide Oraegbu, Emmanuel Jolaiya

Soil Erosion, the displacement of topsoil by water and wind, poses a significant threat to global land health, impacting food security, water quality, climate change, and ecosystem stability. Earth Observation (EO) and remote sensing technologies play a crucial role in monitoring and assessing soil erosion, offering valuable spatial and temporal data for informed decision-making. This paper applied three (3) Machine Learning (ML) models, namely the XGBoost classifier, LightGBM classifier, and CatBoost classifier to perform soil erosion classification in the European Union (EU) region. The data used in this study were sourced from Kaggle, a huge repository of community-published machine learning models and data, and it includes several EO data namely the Landsat 7 seasonal Analysis Ready Data (ARD), BioClim v1.2 historical (1981-2010) average climate data using the CHLSA classification system, annual MODIS EVI data, climatic variables (water vapour, monthly snow probability, annual MODIS LST in daytime or night time, annual CHELSA rainfall V2.1), Human footprint (Hengl et al., 2023), Land cover, Landform and landscape parameters (Hengl, 2018), Lithology (Hengl, 2018). The dataset has a total of 3754 sample points and 139 features. A detailed description of the dataset features can be found here.

During the Exploratory Data Analysis (EDA) process, the visual relationship between the Landsat bands and the target variable (erosion category), revealed that the Near Infrared (NIR) , Short-Wave Infrared I (SWIR1), Short-Wave Infrared II (SWIR2), and Thermal bands were effective in differentiating between the various erosion categories, compared to other bands. This insight gave direction in the feature engineering process. As suggested by Puente et al. (2019), vegetation indices could prove effective in predicting soil erosion. Consequently, we computed various vegetation indices such as the Normalised Difference Water Index (NDWI), Normalised Difference Infrared Index (NDII), and Shortwave Infrared Water Stress Index (SIWSI) as well as applied the Tasseled Cap Transformation which includes Brightness, Wetness and Greenness, to augment the features. To capture textural variations of each pixel location, elevation, and slope-based measures were computed. The Topographic Position Index (TPI) was computed for each position using a 100,000-metre radius, calculating the mean elevation of points within the radius and subtracting it from each point elevation within the radius. Other features computed were the Topographic Wetness Index (TWI), Aspect, LS-Factor, and Stream Power Index (SPI) which reflects the erosive power of streams. Leveraging the thermal band, Land Surface Temperature (LST) was derived. As noted by Ghosal (2021), combining LST with temporal data can identify regions vulnerable to soil erosion.

The development of these models incorporated Scikit-Learn Recursive Feature Elimination (RFE) in the preliminary feature selection process using the XGBoost model as the estimator. The goal of RFE is to return “n” features by training the model on all features, rank all features by importance, and remove the least important features until “n” features remain. The RFE “n” features were set to 200. Afterward, an XGBoost model was trained with the 200 features, and Scikit-Learn’s Randomised Search CV was employed to optimise its hyperparameters, leading to an improved F-1 score for the XGBoost classifier. Using the XGBoost’s classifier feature importance ranking, the top 155 features were selected for use in the final ensemble model for predictions. To provide a more reliable estimate of the performance of the training model, Scikit-Learn's Stratified KFold was implemented with n_splits set to 5 and the erosion category as the stratification variable. By using stratified KFold, a balanced class representation in each fold during training was achieved. For modelling of erosion categories, an ensemble voting classifier combined predictions from three optimised gradient boosting models (XGBoost, LightGBM, CatBoost) using a "soft" voting scheme. This approach aimed to improve accuracy and reduce overfitting compared to individual models. The confusion matrix was used to evaluate the ensemble's performance, considering precision, recall, and F1-score metrics. These metrics assess the model's ability to correctly identify positive and negative cases, with a higher F1 score indicating better overall performance.

The weighted F-1 score reached 0.86, and the weighted precision and recall were 0.86 and 0.86 respectively, indicating that the proposed method using various EO data to predict soil erosion categories (No Gully/badland, Gully, Badland, Landslides) displayed good performance. Specifically, No Gully/badland (0.89, 0.91) and Landslides(1.00, 1.00) had higher precision and recall values, which means that the model can correctly identify areas that fall within these erosion categories with low false positives and false negatives. The Badland(0.49) had the least recall value indicating that the model could not identify a substantial amount of this category.

According to the Feature Importance analysis; Year, Latitude Coordinates, Topographic Wetness Index (TWI), Longitude Coordinates, Maximum Fraction of Absorbed Photosynthetically Active Radiation (FAPAR), Minimum Annual Water Vapour, Mean of Slope, Weighted Difference Vegetation Index (WDVI), Normalised Difference Snow Index (NDSI) and Standard Deviation of Slope emerged as the top ten (10) factors influencing soil erosion. Indicating that Topographic factors and vegetation indices were important for predicting soil erosion. The year was the most important feature, which shows that temporal trends have a huge impact in predicting soil erosion.

In conclusion, this project successfully explored the potential of ensemble learning and EO data for classifying soil erosion, highlighting its promising role in addressing this crucial environmental issue. The proposed framework indicates that Topographic indices like the TWI and vegetation indices like the WDVI hold valuable information for predicting soil erosion. Furthermore, band combinations using near-infrared (NIR), SWIR1, SWIR2, and thermal bands can significantly improve the classification of soil erosion categories. Crucially, EO data like digital elevation models (DEMs) and Analysis Ready Landsat data serve as the foundation for accurate soil erosion prediction. The proposed approach to incorporate multi-temporal EO data offers exciting prospects for even more accurate soil erosion classification.

Omicum
10:00
10:00
30min
Coffee
Omicum
10:30
10:30
30min
Insights on Earth Observation cloud platforms from a user experience viewpoint
Margherita Di Leo

The European Strategy for Data aims at creating a single market for data sharing and exchange to increase the European Union’s (EU) global competitiveness and data sovereignty. Additionally, emphasis is put on the need to prioritize people's needs in technology development and to promote EU values and rights.
The EU has largely invested in making data accessible. Examples of this are the Copernicus Programme, the Group on Earth Observation (GEO) intergovernmental partnership, and the Horizon 2020 and Horizon Europe funding programmes. In the scope of such programmes, several Earth Observation (EO) cloud platforms have been developed, providing access to data, tools and services for a wide range of users, including support to policymakers in developing evidence-based and data-driven policies.
Typically, these platforms are an expression of very specific research communities with different sizes and scope, even niche in some cases, with various and -often under-represented- user needs, as opposed to more mainstream platforms with a wider user uptake.
As a consequence, the current landscape of EO cloud platforms and infrastructures in the EU is rather fragmented, thus their potential is only partially exploited by users. We started our research by classifying existing infrastructures, identifying available good practices and highlighting the technological enablers, in order to point out and leverage the building blocks needed to improve the usability of such platforms (Di Leo et al., 2023).
In this follow-up study, we seek to provide a user-centric perspective, aiming at identifying limitations in the current offer of EO cloud platforms by conducting a research study on user experience. We aim to propose good practices to improve both the platform design and functionalities by taking into account the user viewpoint. Our research questions are:
• Does the current offer cover the entire development lifecycle?
• What are the pain points / bottlenecks to address on the current platforms from a user’s viewpoint?
To create a meaningful sample of EO cloud platforms, we have surveyed use cases from EU flagship initiatives like e-shape, OpenEarthMonitor and GEOSS Platform Plus, to understand more on their use of the platforms. In addition, we have developed an additional use case so to gain hands-on experience on cloud platforms.
Responders to the survey were developers of different use cases in a wide range of sectors, including agriculture, energy, health, ecosystem, disaster management, water, climate and climate change, forestry and oceans. Intended end user categories ranged from business owners to analysts, developers, data scientists and policy makers, as well as citizens. A common need emerging from the exercise is the possibility to integrate datasets of different nature and from different sources: EO, in situ, Internet of Things (IoT) data, etc. Final products of the considered use cases ranged from static maps to streams of data and web apps. In the development lifecycle, techniques such as machine learning, deep learning, parallel computing, virtualization / containerization and data cubes, are of common use among developers.
The main concerns on EO cloud platforms that emerge from the survey were:
(1) The difficulty to discover the services offered / the lack of services to browse the available services;
(2) The reduced accessibility to data and services, as well as the timeliness and coverage of data provision and quality;
(3) The poor transparency of the price;
(4) The limited possibility to integrate heterogeneous datasets and tools from different providers;
(5) The limited quality of learning material and documentation, as well as the frequency of their updates;
(6) The lack of effectiveness of support services such as helpdesks and forums;
(7) The limited possibility of exchanging code, good practices, and support with other users, and the liveness of the communities around the platforms;
(8) The lack of possibility to customize tools and services;
(9) The lack of strategies for the sustainability of platforms after the funding period;
(10) The lack of effective facilities for storage and for advanced functionalities such as machine learning, deep learning, parallel computing, etc.
Based on these responses, we identified a set of dimensions of high relevance for users, which are meant for a self-evaluation by platforms so to improve their offer. Such dimensions can be summarized as 1) discoverability, 2) accessibility, 3) price transparency, 4) interoperability, 5) documentation, 6) customer care, 7) community building (data, models and knowledge sharing), 8) customization, 9) sustainability of business plan and 10) characteristics and performance of the platform.
Among others, the adherence to the FAIR principles (Wilkinson et al, 2016) and to the TRUST principles (Lin et al., 2020), the use of open source components and the compliance to open standards (e.g. from the Open Geospatial Consortium – OGC), all represent essential dimensions to enhance both the platforms’ usability and the user’s satisfaction.
Finally, we discuss the emerging trend of creating federations among platforms. Federations can be of different types such as: federation of identity (e.g. single sign-on); federation of trust; federation of resources (e.g. storage and computational facilities) etc. Federations may overcome many of the problems that we identified, such as i.e. interoperability, discoverability, accessibility, etc. They provide a set of services available from one single place. This trend is expected to grow progressively, especially towards the concept of data spaces, in which the EU is largely investing.
To conclude, the study outlines the need to address challenges and limitations to improve both the usability and user satisfaction when using available EO cloud platforms. The identification of user needs and concerns, along with the emphasis on principles such as FAIR and TRUST, open source components and OGC standards, will be crucial in shaping the future of data platforms and infrastructures in the EU and beyond. Furthermore, the potential of federations among platforms presents an immediate opportunity to move towards the vision of data spaces that the EU is putting forward, thus enhancing both collaboration and data sharing, ultimately contributing to the development of a more cohesive and effective data market in Europe.

Omicum
11:00
11:00
30min
MOOC Cubes and Clouds - Cloud Native Open Data Sciences for Earth Observation
Peter James Zellner

Motivation: The Massive Open Online Course (MOOC) “Cubes and Clouds” teaches the concepts of data cubes, cloud platforms, and open science in the context of Earth Observation (EO). The course is designed to bridge the gap between relevant technological advancements and best practices and existing educational material. Successful participants will have acquired the necessary skills to work and engage themselves in a community adhering to the latest developments in the geospatial and EO world.

Target group: The target group are earth science students, researchers, and data scientists who want to dive into the newest standards in EO cloud computing and open science. The course is designed as a MOOC that explains the concepts of cloud native EO and open science by applying them to a typical EO workflow from data discovery, data processing up to sharing the results in an open and FAIR way.

Content: This MOOC is an open learning experience relying on a mixture of animated lecture content and hands-on coding exercises created together with community renowned experts. The course is structured into three main chapters Concepts, Discovery and Process and Share. The degree of interaction (e.g. hands-on coding exercises) is gradually increasing throughout the course. The theoretical basics are taught in the first chapter Concepts, comprising cloud platforms, data cubes and open science practices. In the second chapter the focus is on discovery of data and processes and the role of metadata in EO. In the final chapter the participants carry out complete processing workflows on cloud infrastructure and apply open science practices to the produced results. Every lesson is concluded with a quiz, ensuring that the content has been understood.

The course contains 13 written lectures that convey the basic knowledge and theoretical concepts, 13 videos which have been created with a professional communication team and in collaboration with a leading expert on the topic and shines a light on a real world example (e.g. The role of GDAL in the geospatial and EO), 16 pieces of animated interactive content which engage the participants to actively interact with the content (e.g. Sentinel 2 Data Volume Calculator) and 11 hands-on coding exercises in the form of curated jupyter notebooks that access European EO cloud platforms (e.g. CDSE) and carry out analysis there using standardized API’s like openEO (e.g. full EO workflow for snow cover mapping).

Infrastructure: The EOCollege platform hosts the lectures and the animated content (e.g. videos, animations, interactive elements) of the course. The hands-on exercises are directly accessible from EOCollege via a dedicated JupyterHub environment, which accesses European EO cloud platforms, such as the Copernicus Data Space Ecosystem, using its open science tools like the Open Science Data Catalogue, openEO and STAC. Guaranteeing that the learned concepts are applied to real-world applications. In the final exercise the participants will map the snow cover of an area of interest they choose and make their results openly available according to the FAIR principles on an web viewer (STAC browser). This community mapping project actively lives the idea of open science, collaboration and community building.

Learning achievements: After finishing the course, the participants will understand the concepts of cloud native EO, be capable of independently using cloud platforms to approach EO related research questions and be confident in how to share research by adhering to the concepts of open science. After the successful completion of the course the participants receive a certificate and diploma supplement and their personal map is persistently available in the web viewer as a proof of work.

Benefits for the open geospatial community: The MOOC is valuable for the geospatial and EO community and open science as there is currently no learning resource available where the concepts of cloud native computing and open science in EO are taught jointly to bridge the gap towards the recent cloud native advancements. The course is open to everybody, thus serving as teaching material for a wide range of purposes including universities and industry, maximizing the outreach to potential participants. In this sense also the raw material of the course is created following open science practices (e.g. GitHub repository, Zenodo, STAC Browser for results) and can be reused and built upon.

The "Cubes and Clouds" MOOC equips participants with essential skills in cloud native EO and open science, enhancing their ability to contribute meaningfully to the open geospatial community. By promoting transparency, reproducibility, and collaboration in research, graduates of the course strengthen the foundations of open science within the community. Access to cloud computing resources and European EO platforms empowers participants to undertake innovative research projects and share their findings openly, enriching the collective knowledge base. Ultimately, the MOOC fosters a culture of openness and collaboration, driving positive change and advancing the field of geospatial science for the benefit of all.

Structure of the Talk: Our talk will interactively guide through the MOOC and showcase the learning experience. To evaluate its usefulness the perception of the first participants will be analyzed and finally we will jointly discuss activities to integrate with other teaching and tech communities (e.g. Pangeo).

Links:
- EOCollege: MOOC Cubes and Clouds
- GitHub
- Zenodo
- Community mapping project - Cubes and Clouds Snow Cover Stac Collection

Omicum
11:30
11:30
30min
Bridging geomatics theory to real-world applications in alpine surveys through an innovative summer school teaching program
Federica Gaspari

Applying skills gained from university courses marks a pivotal step in crafting engaging teaching methods. Including practical activities in higher education programs plays a crucial role in knowledge transfer, especially in geomatics (Tucci et al., 2020). Moreover, engaging groups of students along the entire process of in-situ survey design, data collection, management, processing and results preparation furtherly foster their responsibility as well as the awareness of the technologies adopted, actively understanding their limitations and potentials (Balletti et al., 2023). In recent years, STEM and geomatics have seen a growing number of learning experiences based on open knowledge (Gaspari et al, 2021, https://machine-learning-in-glaciology-workshop.github.io/, Potůčková et al., 2023). In this context, this work is presenting an innovative teaching experience framed in the mountainous environment of the Italian Alps describing the structure of the course and the potential of open geo education in geomatics.

Since 2016, the Geodesy and Geomatics Section of the Department of Civil and Environmental Engineering of Politecnico di Milano organised a Summer School for Engineering, Geoinformatics and Architecture Bachelor and Master students consistently aimed to bridge the divide between theory and practice. The Summer School is framed within a long-term monitoring activity of the Belvedere Glacier (https://labmgf.dica.polimi.it/projects/belvedere/), a temperate debris-covered alpine glacier, located in the Anzasca Valley (Italy), where annual in-situ GNSS and UAV photogrammetry surveys have been performed since 2015 to derive accurate and complete 3D models of the entire glacier, allowing the derivation of its velocity and volume variations over the last decade.

In a week-long program, students are encouraged to collaborate, with the supervision of young tutors passionate about the topic, to develop effective strategies for designing and executing topographic surveys in challenging alpine regions. This program involves them in hands-on learning experiences, also directly engaging students in a wider ongoing research project, getting familiar with the concept of open data and with the adoption of dedicated open-source software.

The summer school program is divided into 6 modules whose goal is to introduce students to key theoretical concepts of fieldwork design, UAV photogrammetry, GNSS positioning, GIS and spatial data analysis, image stereo-processing and 3D data visualization. Along with theory, practical sessions are organised with guided case study-driven exercises that allow students to get familiar also with FOSS4G tools such as QGIS, CloudCompare and PotreeJS. The teaching materials used to guide students through exercises with processing software is made openly accessible online with a dedicated website built on top of an open-source GitHub repository with MkDocs (https://tars4815.github.io/belvedere-summer-school/), setting the groundwork for developing collaborative online teaching and expanding the material for other learning experiences with future versions.

Adding value to the experience, students also contribute to a research project regarding the monitoring of the glacier (Ioli et al., 2021; Ioli et al., 2024), providing valuable insights on the recent evolution of the natural site. The georeferenced products derived from the in-situ surveys are indeed published in an existing public repository on Zenodo (Ioli et al., 2023), sharing results with a wider scientific community.

Furthermore, in order to optimise the management of the information and data collected during the different editions of the summer school, a relational database has been designed and is currently under implementation with PostgreSQL and PostGIS. Such solutions allow for querying the location of markers deployed on the glacier surface and measured every year by GNSS, making it possible to accurately describe the glacier movements. Additionally, a database allows for effectively storing the results of the annual in-situ surveys carried out during the summer schools, as well as documenting the instruments and the procedure employed to acquire and process the data.

In summary, this study highlights the commitment to open education within the realm of geomatics, with the ongoing transformation of the Belvedere Summer School program into an experience mainly driven by open-source software. Beyond the educational focus on fieldwork design and data analysis, the project extends to a comprehensive approach to transparency, making resources openly accessible through a dedicated website. In this way, the Summer School aspires to contribute significantly to the principles of open education in geomatics, thereby establishing an accessible bridge between education, research, and the open-source community.

Bibliography:

Balletti, C. et al. (2023): The SUNRISE summer school: an innovative learning-by-doing experience for the documentation of archaeological heritage, https://doi.org/10.5194/isprs-archives-XLVIII-M-2-2023-147-2023

Gaspari, F., et al. (2021): Innovation in teaching: the PoliMappers collaborative and humanitarian mapping course at Politecnico di Milano, https://doi.org/10.5194/isprs-archives-XLVI-4-W2-2021-63-2021

Ioli, F. et al. (2021). Mid-term monitoring of glacier’s variations with UAVs: The example of the belvedere glacier. Remote Sensing, 14(1), 28.

Ioli, F., et al. (2023). Belvedere Glacier long-term monitoring Open Data (1.0) Zenodo. https://doi.org/10.5281/zenodo.7842348

Ioli, F., et al. (2024). Deep Learning Low-cost Photogrammetry for 4D Short-term Glacier Dynamics Monitoring. https://doi.org/10.1007/s41064-023-00272-w

Potůčková, et al. (2023): E-TRAINEE: open e-learning course on time series analysis in remote sensing, XLVIII-1/W2-2023, 989–996

Tucci, G., et al. (2020). Improving quality and inclusive education on photogrammetry: new teaching approaches and multimedia supporting materials, https://doi.org/10.5194/isprs-archives-XLIII-B5-2020-257-2020

Omicum
12:00
12:00
5min
Does open data open new horizons in urban planning?
Nikola Koktavá

The aim of this study is to provide a comprehensive view of the issue of open data in Czech cities and thus give the world community an insight into the state of open data in the Czech Republic. It serves as a basis for further research and implementation of open data in urban planning. Its results can be used not only for the benefit of the professional community but can also serve as a basis for decision-making by city authorities in the planning and development of urban space. The open data are therefore integral part of developing smart cities (Ojo, Curry, Zeleti, 2015). This extensive study deals with the issue of the availability of open data in Czech cities and to what extent are they used use in the framework of urban planning and development of urban space. In the context of rapid digitization and technological progress, open data is becoming increasingly important for the effective management and design of urban infrastructure. This study systematically analyses the current state of open data in Czech cities, identifies key aspects of their availability and examines their potential applications in urban planning. The study focuses in more detail on Brno, which is the second largest city in the Czech Republic and provides freely available data on its website data.brno.cz.

The first part of the study focuses on the theoretical framework of open data and its significance for modern urban planning. The basic principles of open data are introduced, including the standards and formats currently in use. The advantages of open data in the context of transparent decision-making, citizen participation and sustainable urban development are also discussed. In the Czech Republic, the possibilities of providing and using open data has been more and more discussed in the last ten years, especially at the level of data from state organizations. Nevertheless, the term open data is not understood in the same way by all organizations, when for example PDF format is considered as open data format. At the same time, we also perceive the problem of the completeness, data quality and consistency of open data, as well as missing metadata for easier understanding of lineage.

The analysis of available data in specific Czech cities follows up in the second part of the study. The analysis performed includes the identification of existing data sources such as geographic data, traffic information, demographic data and other relevant information for urban planning. Each data source is subject to a detailed evaluation, including assessment of quality, topicality and availability. The Czech regional cities try to provide open data using geoportals., The largest geoportals are data.brno.cz, geoportalpraha.cz, mapy.ostrava.cz, but there exist others. However, state-government institutions also provide data. The geoportal.cuzk.cz and subsequently geoportal.gov.cz might be considered as the largest provider of data (including open data). A large amount of basic statistical data is provided by the Czech Statistical Office, including the last census from 2021, published mainly as open data.

Regular hackathons are already organized to increase awareness about open data on these portals, to illustrate the range of possible use of data and the power of making data available to a wide professional and general public. One of the most creative examples can be the Minecraft world derived from a 3D model of the city of Brno. Such an unconventional method may better attract general public to think more about their city and how to contribute to its improvement.

In the following part of the study, concrete examples of the use of open data in urban planning are presented. The making available of 3D data of cities became one of the most significant step for the needs of architectural or urban studies. We cannot forget the making accessible of the basic map study (The Fundamental Base of Geographic Data of the Czech Republic) including Digital model of relief, Digital model of surface and orthophoto in the form of open data last year by the State Administration of Land Surveying and Cadastre. Some potential can be hidden in the emerging Digital Technical Map. Different insights onto the location can arise with the optics of the crime rate published by police office. In short, successful projects are described where open data played a key role in optimizing traffic, planning public spaces, and improving the quality of life of residents. Based on these examples, recommendations are proposed for the further development and use of open data in the urban planning environment.

In the final part of the study, the challenges and opportunities associated with the implementation of open data in Czech cities are discussed. Potential strategies for improving the availability of open data are presented, including collaboration between city authorities, the academic sector and civil society. In addition, ethical and security issues related to the handling and sharing of sensitive data in an urban context are stressed.

Omicum
12:05
12:05
5min
Advancing water productivity monitoring: Waplugin for the analysis and validation of FAO WaPOR data in QGIS
WAPlugin Team, Akshay Dhonthi, Fabian Humberto Fonseca Aponte

Remote sensing data have become indispensable for monitoring water resources and agricultural activities worldwide, offering comprehensive spatial and temporal information critical for understanding water availability, agricultural productivity, and environmental sustainability (Karthikeyan et al., 2020). The FAO Water Productivity Open Access Portal (WaPOR), developed by the Food and Agriculture Organization of the United Nations (FAO), provides extensive datasets derived from remotely sensed data (FAO, 2019). These datasets play a crucial role in water productivity monitoring, especially in regions facing water scarcity and intensive agricultural activity.
However, the manual extraction and importation of WaPOR datasets from the WaPOR platform can be time-consuming and complex. Users typically navigate the platform to locate specific datasets, download the files, and then import them into their preferred Geographic Information System (GIS), such as QGIS. This process often requires users to repeat these steps for multiple datasets, consuming a significant amount of time. Additionally, ensuring the accuracy and reliability of remotely sensed data, including WaPOR datasets, requires validation against ground-based measurements (Wu et al., 2019). This validation process involves evaluating the correlation between remote sensing data and ground measurements to determine their suitability for further analysis and decision-making. However, this process involves a complex workflow and often requires multiple tools and software programs, further increasing the time and effort needed to process and analyze the data.
To address these challenges comprehensively, we developed WAPlugin, a comprehensive solution designed to streamline the entire process of accessing and analyzing FAO WaPOR datasets within the QGIS environment. WAPlugin is a user-friendly plugin that automates the retrieval of WaPOR datasets directly from the WaPOR platform, eliminating the need for users to navigate through the platform manually. The manual extraction and importation of WaPOR datasets into QGIS for analysis can be time-consuming, with users often spending around 30 minutes on each dataset. WAPlugin significantly reduces this time by automating the extraction and importation of WaPOR data directly into the QGIS environment, allowing users to reduce the time required for each dataset by approximately 83%. With an estimated time of just 5 minutes per dataset, WAPlugin saves users valuable time, enabling them to focus more on data analysis and decision-making.
Moreover, WAPlugin not only streamlines the data acquisition process but also enhances the validation process by offering integrated functionality. Users can effortlessly upload ground observations and conduct comprehensive statistical analyses within the QGIS environment. This includes the calculation of a wide range of validation metrics, such as root mean square error (RMSE), mean absolute error (MAE), bias, coefficient of determination (R-squared), and scatter index. These metrics provide detailed insights into the accuracy and reliability of the WaPOR data by quantifying the level of agreement between remote sensing measurements and ground observations. By facilitating the calculation and visualization of these metrics directly within the QGIS environment, WAPlugin empowers users to make informed decisions regarding the suitability of the data for their specific applications. This built-in workflow not only saves time but also ensures the robustness of analyses, ultimately contributing to more accurate and reliable assessments of water productivity and agricultural activities.
By combining these tasks into a single, intuitive interface, WAPlugin significantly reduces the time and effort required for data acquisition and validation, enabling users to focus more on data analysis and decision-making. WAPlugin provides a complete solution for using FAO WaPOR datasets to analyze water productivity within the QGIS environment. By simplifying data retrieval and integrating validation functions, the plugin improves the accessibility and reliability of remotely sensed information.
Furthermore, WAPlugin contributes to enhancing collaboration among researchers and practitioners in the field of water resources and agriculture. The streamlined process for accessing and analyzing WaPOR datasets promotes knowledge sharing and facilitates interdisciplinary research endeavors. This collaborative aspect is crucial for addressing complex challenges such as water management and agricultural sustainability, which require insights from diverse perspectives and expertise.
In addition to its practical utility, WAPlugin also serves as an educational tool, empowering users with the knowledge and skills to leverage remote sensing data for addressing real-world challenges. By providing a user-friendly interface and integrating essential functionalities, the plugin facilitates learning and capacity building in the field of geospatial analysis and environmental science.
WAPlugin represents a significant advancement in the field of remote sensing and geospatial analysis, offering a practical solution for enhancing the accessibility and usability of WaPOR datasets. Its impact extends beyond technical efficiency to broader implications for research, collaboration, and education in the domains of water resources management, agricultural productivity, and environmental sustainability. As remote sensing technologies continue to evolve and play an increasingly vital role in addressing global challenges, tools like WAPlugin will remain essential for maximizing the potential of these technologies in informing evidence-based decision-making and fostering sustainable development.
In conclusion, WAPlugin stands as a pivotal tool for remote sensing applications for water resources management and agricultural productivity. Its ability to streamline data acquisition, analysis, and validation processes not only enhances efficiency but also promotes collaboration and knowledge exchange among stakeholders. As we navigate the complexities of sustainable resource management in a changing climate, WAPlugin exemplifies the transformative potential of technology in addressing pressing global challenges.

Omicum
12:10
12:10
5min
Benefits and pitfalls of emotional and mobility web mapping
Nikola Koktavá

The popularity of participative mapping continuously grows and is becoming an essential tool to involve citizens in urban planning, architectural solutions and transport design. Citizens can quickly and easily review proposals and variants, explore models and visualizations, express their opinions, pin comments, and vote on their favourites (Ribeiro and Ribeiro 2016). Emotional maps and similar mapping tools are frequently used in Czechia, especially for mapping citizens’ attitudes towards both physical and social features of the urban environment. Quantitative assessment of mapping results can help urban planners better understand citizens’ perception and improve the targeting of planned measures (Camara, Camboim, and Bravo 2021). Discussion sometimes arises about the validity of such mapping, complementarity or substitution of traditional questionnaire surveys. The objective of the paper is to discuss benefits and weaknesses of such tools and to compare them with questionnaire surveys.
The case study is focused on two middle-sized Czech cities, Ostrava (OV) and Hradec Kralove (HK), and selected rural municipalities in their surroundings. Participants are all seniors (age 65+) due to the project aim of understanding seniors’ spatial mobility, accessibility and perception.
The questionnaire survey was conducted in 2022 by the Research Agency STEM/MARK (n=536, PAPI method 86%, CAWI method 14%). Quota sampling used stratification by age, gender, territory, and urbanization based on census data.
At the same time, two web map applications were launched - the emotional and mobility maps. We used the platform EmotionalMaps.eu which utilizes a Leaflet library (Pánek et al. 2021).
In the map application, respondents indicate their age group and health limitations, and mark one or more locations: attractive locations, repulsive locations, barriers to movement, attractive paths, repulsive paths, and approximate residence location. Each marked target can be further specified by 16 reasons with a multiple-choice option, visiting frequency, schedule, and weather and social constraints (Horak et al. 2022).
In the mobility map, respondents specify one or more of their favourite locations in the following categories: home, workplace, retail, pharmacy, post office, doctor, supermarket, ATM, worship, services, park, restaurant, visiting family or friends, garden or cottage, or other place. After marking each point, they may specify frequency of attendance and transport mode.
The main advantages for emotional and mobility web mapping are cost effectiveness, flexibility of use, usually large sample size, attractiveness of design, ease of use for people with computer or mobile skill, ability for accurate positioning of the targets, customized map design (zoom, pan, etc.), larger extent, ability to describe more specific conditions, use of illustrative pictures or icons, interactive help, consistency monitoring, integrity constraints, and selection from specified options. Disadvantages include no validation of the respondent profile, bias of respondents towards more technically skilled and wealthier people, privacy concerns, and duplicate responses (Wikstrøm 2023).
The biggest problems were encountered when drawing lines to specify attractive and repulsive paths. We obtained only 32 records from OV and 29 records from HK and evident errors represent 19% and 40%, respectively.
Quota sampling was not applied on the web mapping data, only a basic selection of the relevant age group and residence in HK or OV. The differences of the respondents’ profiles between the three methods of survey show clear bias towards younger and more healthy seniors in the case of web mapping and CAWI.
Any surveys’ raw data contains some inaccuracies, errors, or odd responses from people misunderstanding questions, misusing tools, trial responses, intention to damage data or outputs, or having concerns (e.g., losing privacy). Deviations from planned quota shares in the quota-based survey may result in the removal of some respondents and/or the need to conduct an additional survey (in our case, 40-46% in two villages). The data's temporal consistency is deteriorated by such changes.
The primary aim of the survey was to discover seniors’ mobility targets. We asked for their dwelling location and up to four of their most important targets, listed in descending order by their perceived importance, written as a free text. To specify the locations of residence and targets we asked for addresses or another useful specification. Respondents identified 23 kinds of important targets in HK and 24 in OV with the following main priorities: shopping (37 and 24%, resp.), doctor (19 and 22%), family (10 and 13%), walking (8 and 6%), and friends (5 and 4%). An additional problem is that 5% of free-text destinations had multiple targets.
The web mobility mapping requested specification of favourite locations for one or more targets in the 13 categories, the residence and the “other” target (specified by free text). Respondents identified 16 kinds of important targets in HK and 12 in OV with the following priorities: retail (15 and 12%, respectively), supermarket (12 both), pharmacy (12 and 10%), post office (11 and 10%). Such a flat distribution is caused by the respondents’ tendency to mark only one target per category.
The accuracy of location is variable. While the web mapping application instantly provides coordinates for each location, the targets from questionnaires require geocoding. In our case, geocoding was successful only for 65% of records. Among these, 18% were geocoded by utilizing the complete address, 53% were geocoded by finding the nearest matching destination, 24% were geocoded manually with interpretation, and 5% were geocoded but only to the center of the street
Further, the spatial distributions of targets were compared. The clustering of both indicated targets and all targets available in OpenStreetMap is confirmed by the M-function in both variants (questionnaire and web mapping). The analysis of distances from a residence to an indicated real target shows more clustering for questionnaire targets around a residence than for those from web mobility mapping. However, the selection of closer destinations in the questionnaire is influenced by the age bias of respondents and by the limited number of requested targets (up to four).
The study contributes to the discussion on the validity of participative mapping and sheds a light on the importance of carefully preparing such surveys and pre-processing data comprehensively.

Omicum
12:15
12:15
5min
CITY TRANSPORT ANALYZER: A POWERFUL QGIS PLUGIN FOR PUBLIC TRANSPORT ACCESSIBILITY AND INTERMODALITY ANALYSIS
Gianmarco Naro, Carlo Andrea Biraghi

Mobility is one of the main factors affecting urban environmental performances. Car dependency is still widespread worldwide and integrated planning approaches are needed to exploit the potential of active and shared mobility solutions, making them an effective alternative to the use of private vehicles. The analysis and optimization of public transportation (PT) services have so become increasingly important in the planning and management of urban infrastructure. This work aims to develop and implement a QGIS plug-in for analyzing urban PT networks, assessing the accessibility and intermodality dimensions, relying on General Transit Feed Specification (GTFS) data as source of information.

GTFS is a standardized format for PT schedules and geographic information. It defines a common format for transit agencies to share their data, making it possible for developers to create applications that provide accurate and up-to-date information about services. This standard was chosen because it is one of the most popular and widely used, especially when the data are used for static type analysis. The information extracted mainly concerns PT stops, routes and nodes preparatory to route construction and connection. All data belonging to the geospatial standard, in order to be usable by GIS software, must be extracted, interpreted and converted to a GIS layer. Specifically, all information regarding stops and routes was extracted to obtain a vector layer for each type of data. Going deeper, one of the most important layers concerns that of the PT routes, as it shows the entire urban network, obtained by converting the data within a graph data structure using NetworkX, a library for the creation, management and manipulation of complex networks, including graphs. This graph was created following a personal interpretation with the aim of facilitating the achievement of our purpose. to facilitate the achievement of our purpose, it was decided to model the edges of the graph in such a way that an edge is only used by one PT route. If two public vehicles use the same edge, there will be two different overlapping edges. It is also important to emphasise that each edge in the graph shows the type of means of transport using it (underground, train, bus, ...), the average travel time of that edge, and the length of the edge itself. The creation of the graph is fundamental to carry out two types of analysis.

The accessibility analysis is conducted to determine which areas are reachable within the specified time frames via all the possible combinations of PT lines. Starting from any point in the city, it provides service areas combining PT and walking within a given time interval defined by the user up to a maximum of 60 minutes. The outputs are both lines, all the edges of the network that can be travelled, and polygons, convex hulls built on them. This analysis, already available only within proprietary software ArcGIS, is extremely useful to provide very detailed information about the potential of each PT stop and its surrounding urban area. The second analysis concerns PT interoperability and introduces some elements of novelty. It is intended to assess intermodality beyond the PT nodes (hubs), exploring which paths in the street network have the higher probability of being taken to change from one line/mode to another. The evaluation is purely physical and only considers network distance. Its results are expected to be integrated with complementary dimensions as proximity to Point of Interests, street comfort and safety for a holistic planning approach. Starting from any PT stop, a circular catchment area is drawn using a user-defined distance and the PT stops within it are selected. Among them, those with at least one PT line in common with the departure stop are discarded, the remainder being selected. This is done assuming that PT is generally faster than walking and so, when the PT alternative is available, walking is less attractive. It is then shown how the starting stop is connected to the other stops via the most direct pedestrian path. Finally, once drawn all the pedestrian paths, the number of times that each street segment is used is also calculated, providing a classification according to their potential use for modal change. The pedestrian graph is obtained through OSMnx, a library for retrieving, processing, and visualizing road network data from OpenStreetMap.

The plugin was tested on two different case studies, Milan and Rio de Janeiro, producing significant results highlighting the created plug-in’s utility and application in the context of GTFS data-driven studies of urban public transportation networks. The outcomes of both analyses were consistent, demonstrating the plugin’s applicability in comprehending the dynamics of metropolitan public transit networks. Overall, the plug-in stands out as an important tool that can analyse GTFS data and use it to create a network of a city’s PT. The built plug-in provides a flexible and easy-to-use tool for studying urban PT networks, which constitutes a significant addition to the geospatial community. The plug-in offers a thorough overview of service coverage, accessibility, and connectivity within various metropolitan contexts by utilizing GTFS data. Subsequent examinations offer a powerful tool for analysing specific areas of a city, showing interconnections between stops and possible routes that can be travelled. The studies are therefore very useful as they quantitatively analyse a context, considering the context itself, assessing the accessibility and interoperability of an urban area.

The ultimate goal is to contribute to a deeper understanding of urban public transportation networks and urban areas through a practical and intuitive tool that can be used by those involved in the analysis and management of city infrastructure. Work is also underway to extend these analyses to other city contexts, thus not limiting them to public transportation alone. For example, by showing the distribution of Points of Interest within the city, highlighting how they are interconnected. This must, however, be done while trying to maintain a reasonable runtime, as it can still be a problem for very complex and detailed networks.

Omicum
12:30
12:30
90min
Lunch
Omicum
14:00
14:00
30min
Geometrically guided and confidence-based denoising
David Youssefi

Introduction

As part of the CO3D mission (Lebegue et al., 2020), carried out in partnership with Airbus, CNES is developing the image processing chain including the open source photogrammetry pipeline CARS (Youssefi et al., 2020). By acquiring land areas within two years, providing 4 bands (Blue, Green, Red, Near Infra Red) at 50 cm, the objective is to produce a global Digital Surface Model (DSM) with 1 m relative altimetric error (CE90) at 1 m ground sampling distance (GSD) as target accuracy. The worldwide production of this 3D information will notably make a real contribution to the creation of digital twins (Brunet et al., 2022). Satellite imagery provides global coverage, which unlocks the possibility to update the 3D model of any location on Earth within a rapid time frame. However, due to the smaller number of images or lower resolution than drone or aerial photography, a denoising step is necessary to extract relevant 3D information from satellite images. This step smooths out features while retaining their edges that are sometimes barely recognizable relative to the sensor resolution, such as the edges of small houses or the narrow gaps between them as our results show.

Geometrically guided and confidence-based point cloud denoising

Point cloud denoising is a topic widely studied in 3D reconstruction: several methods, ranging from classical to deep learning-based have been designed over the past decades. In this article, we propose a new method derived from bilateral filtering (Digne and de Franchis, 2017) integrating new constraints. Our aim is to show how a priori knowledge can be used to guide denoising and, above all, to produce a denoised point cloud that is more consistent with the acquisition conditions or metrics obtained during correlation.

This new method takes into account two important constraints. The first is a geometric constraint. The input to the denoising step is a point cloud from photogrammetry resulting from matched points on the sensor images. Our pipeline CARS derives lines of sight from theses matched points and, the intersection of these lines give the target 3d positions. In our method, when we denoise this point cloud, the points are constrained to stay on their initial line of sight. This has two main advantages: the associated color will remain consistent with the new position and points won't accumulate in certain spaces and create dataless areas.

The second constraint comes from the correlator PANDORA. The article (Sarrazin et al., 2021) describes a confidence metric, named ambiguity integral metric, to assess the quality of the produced disparity map. This measurement determines the level of confidence associated with each of the points. Each point is moved along the line of sight according to its confidence: the less confident the correlator, the more the point is moved while respecting the geometric constraint mentioned earlier. Appart from these two added major constraints, our method still uses usual denoising parameters, such as initial color and position of each considered point regarding its neighborhood. Normal smoothing is included to compensate correlation inaccuracy.

Evaluation and applications

Early results are extremely promising. A visual comparison of the mesh obtained before and after our proposed denoising step in a dense urban area will be provided in the final article (Figure 1). This illustration shows that the regularization preserves fine elements and sharp edges and smooths out the flat features (roofs, facades). Even if we cannot yet guarantee that denoising will improve the accuracy of the 3D point cloud (or the DSM compared to the airborne LiDAR), this verification will be the subject of future work which will be described in the full paper, we can already affirm that the proposed denoising filter significantly improves rendering and realism. In fact, this denoising makes it possible to enhance roof sections that are barely visible in the denoised point cloud, thus facilitating the building reconstruction stage for the generation of 3D city models (CityGML). In order to evaluate the quality of the 3D reconstruction on a larger scale, we plan to use Lidar HD®. This freely distributed data contains 10 points per m² and includes a semantic label for each point, allowing for a class-specific quality assessment according to building, vegetation or ground. We are currently benchmarking state of the art solutions according to metrics that reflect how fine elements are missed in the absence of geometric and confidence constraints.

Perspectives

In future work, we would like to see the potential of adding the constraints proposed in the paper to other denoising methods, find out whether it is possible to do this using deep learning techniques. In addition to comparisons with ground truth, we would also like to prove that denoising makes it easier to reconstruct 3D city models, for example by showing that we can increase the level of detail even with very high resolution satellites such as Pleiades HR. Finally, with a view to using 3D as a digital twin, this denoising could be a tool for simplifying 3D models according to specific simulations. We would therefore like to begin a parameterisation study to quantify the trade-off between simplicity and quality.

Omicum
14:30
14:30
30min
Towards automation of river water surface detection
Stefano Conversi

It is well known that climate change impacts are increasingly affecting European territory, often in the shape of extreme natural events. Among those, in recent years, heat waves due to global warming contributed to the acceleration of drying process. Particularly, the Mediterranean areas are expected to face extraordinary hot summer and increasingly frequent drought events, which may clearly affect the population. As a partial confirmation of this forecast, in between 2022 and 2023 Southern Europe was affected by lasting drought conditions, which had several outcomes on the ecosystems. As an example, in Po River (the longest Italian water stream) the worst water scarcity of the past two centuries was recorded (Montanari et al., 2023). Experts agreed on the exceptionality of the phenomenon, stating nevertheless the repeatability of such events in near future (Bonaldo et al., 2022). Willing to face them, local authorities expressed the need of tools for monitoring the impacts of drought on rivers, so to be capable of promptly enacting countermeasures.
In this context, the authors partnered with Regione Lombardia for building a procedure oriented at the exploitation of Copernicus Sentinel-1 (SAR) and Sentinel-2 (optical) sensor fusion for water surface mapping, applied in the case study of Po River (Conversi et al., 2023), based on supervised classification of combined optical and SAR imagery. The current work will present an evolution of the proposed methodology, which includes a considerable effort towards the full automation of the process, a necessary step for making it user friendly for public administration.

The designed procedure, built in Google Earth Engine, is based on the combination of three images, namely the S-1 VV speckle filtered band (Level 1, GRD) and the spectral indices Sentinel Water Mask and NDWI derived from S-2 (Level 1-C, orthorectified). Input imagery is selected to ensure complete coverage of the area of interest, with mosaicking if necessary images coming from different dates, a reliable assumption considering that the drought is usually a slow phenomenon. The interval of time between images is anyway minimized by the code, depending on data quality and availability. Training polygons are drawn by photointerpretation and then fed to a Random Forest-based supervised classifier, jointly to the three aforementioned images. The outcome of the procedure is constituted by a map of water surface detected over the area of interest, complemented with an estimate of the extent in km2. Results are then validated and correlated with hydrometric records coming from the field, which corroborated the overall performance (Conversi et al., 2023).

This paper proposes an advancement in the methodology, aimed at enhancing its usability by non-expert users, so to set the base of the development of a tool that can be exploited by local stakeholders. An efficient automatic extraction of training samples, is achieved by randomly extracting the training set of pixels from a binary mask (water/non-water).
This water/non-water mask is derived by the combination of three sub-masks resulting from the automatic thresholding of the input imagery (VV, SWM, NDWI), obtained with the Bmax Otsu algorithm (Markert et al., 2020). The water/non water mask includes only the pixels which have the same behavior for all input images and along the reference period.
The thresholding procedure is automated using the concept of Otsu histogram-based algorithm for image segmentation. This methodology allows to define an optimal threshold value for distinguishing background and foreground objects. The inter-class variance is evaluated and the value that maximizes it is chosen, thus maximizing the separability among pixel classes as well (Otsu, 1979). A modified version of the algorithm, the Bmax Otsu, was exploited, which was originally developed for water detection through Sentinel-1. Otsu algorithm is indeed particularly effective in case of images characterized by a bimodal histogram of pixel values, while Bmax Otsu is more suitable in presence of multiple classes or complex backgrounds (Markert et al., 2020), which is the case for the application presented in this work. The Bmax Otsu is based on a checkerboard subdivision of the original image, on user-selected parameters. The maximum normalized Between-Class Variance (BCV) is evaluated in each cell of the checkerboard and sub-areas characterized by bimodality are selected for applying the Otsu algorithm, thus leading to the goal threshold value (Markert et al., 2020).
As mentioned, the outcomes of the Bmax Otsu procedure are exploited for extracting random training samples for the machine learning-based classification algorithm. The best classification performance is obtained with a number of pixels that corresponds to the 0.15% of the region of interest.
The validation was carried out with respect to another classification of the same area obtained with photo-interpreted training samples (Conversi et al., 2023), showing accuracies of the order of 80-90%. The automated version of the methodology for integrating optical and radar images in mapping river water surface then proved its effectiveness among several date intervals taken as reference.
Although the automation of the training sample selection slightly decreases the accuracy of the overall result with respect to the original approach, the gain in terms of usability is invaluable. Indeed, the elimination of the necessity for the user of photointerpreting imagery and drawing polygons to train the classification algorithm represents a relevant step towards the realization of a standalone tool to be used by the public administration in real applications of river drought monitoring.

Omicum
15:00
15:00
30min
Comparing spatial patterns in raster data using R
Jakub Nowosad

Spatial pattern is an inherent property visible in many spatial variables. Spatial patterns are often at the heart of many geographical studies, where we search for existing hot spots, correlations, and outliers. They may be exhibited in various forms, depending on the type of data and the underlying processes that generated the data. Here, we will focus on spatial patterns in spatial rasters, but the concept can be extended to other types of spatial data, including vector data and point clouds.

Patterns in spatial raster data may have many forms. We may think of spatial patterns for continuous rasters as an interplay between intensity and spatial autocorrelation (e.g., elevation) or between composition and configuration for categorical rasters (e.g., land cover) (Gustafson, 1998). Intensity relates to the range and distribution of values of a given variable, while spatial autocorrelation is a tendency for nearby values of a given variable to be more similar than those that are further apart. On the other hand, composition is the number of cells belonging to each map category, while configuration represents their spatial arrangement. Another distinction is between the data dimensionality. The most common situation is when we only use one layer of given data (e.g., an elevation map or a land cover product for one year). However, we may also be interested in sets of variables (layers, bands), such as hyperspectral data, time series, or proportions of classes. An additional special case is the RGB representation of the data.

Assessing the similarity of spatial patterns is a common task in many fields, including remote sensing, ecology, and geology. This procedure may encapsulate many types of comparisons: comparing the same variable(s) for different areas, comparing different datasets (e.g., different sensors), or comparing the same area but at different times.

Given various possible scientific questions and the fact that we have a plethora of forms of spatial data, there is no universal method for assessing similarity between two spatial patterns. The basic method is a visual inspection; however, it is highly subjective, both from the observer’s and visualization type’s perspectives. Fairly straightforward other approaches are to create a difference map, count changed pixels, or look at the distribution of the values. More advanced methods include the use of machine learning algorithms. However, these methods are often complex, require a lot of data, and are not always interpretable. An alternative and general approach, inpired by content-based image retrieval (Kato, 1992), is to use spatial signatures to represent spatial patterns and dissimilarity measures to compare them (Jasiewicz and Stepinski, 2013).

A spatial signature is any numerical representation (compression) of a spatial pattern. For a categorical raster, it can be a co-occurrence vector of classes in a local window, while for a time series, it may be a vector of values in a given cell. Then, having spatial signatures for both areas (sensors, moments), we can compare them using a dissimilarity measure (e.g., Euclidean distance, cosine similarity, etc.) (Cha, 2007). This approach can compare complex, multidimensional spatial patterns, but at the same time, it gives some degree of interpretability. It can also be further applied to many techniques of spatial data analysis, including spatial clustering (to find groups of areas with similar spatial patterns) and segmentation (to create regions with similar spatial patterns).

While the concept of applying spatial signature and dissimilarity measures is powerful, there are still many unresolved issues and questions to consider. It includes the topics of scale of comparison, input data resolution, dimensions, or types, used spatial signatures, and selected dissimilarity metrics. There is still a lack of studies that systematically compare different methods of assessing similarity between spatial patterns, or suggest good practices in their use. At the same time, a growing number of FOSS tools allows us to test various methods and apply them to real-life scenarios.

The goal of this work is to provide an overview of existing R packages for comparing spatial patterns. These include ‘motif’ (for comparing spatial signatures for categorical rasters; Nowosad, 2021), ‘spquery’ (allowing for comparing spatial signatures for continuous rasters), and ‘supercells’ (for segmentation of various types of spatial rasters based on their patterns; Nowosad and Stepinski, 2022). It will show how they can be applied in real-life cases and what their limitations are. This work also aims to open a discussion about the methods for assessing similarity between spatial patterns and their FOSS implementations.

References

Cha, S-H. (2007). Comprehensive Survey on Distance/Similarity Measures Between Probability Density Functions. Int. J. Math. Model. Meth. Appl. Sci.

Gustafson, E.J. (1998) Quantifying landscape spatial pattern: what is the state of the art? Ecosystems

Jasiewicz, J., & Stepinski, T. F. (2013). Example-Based Retrieval of Alike Land-Cover Scenes From NLCD2006 Database. IEEE Geoscience and Remote Sensing Letters, https://doi.org/10.1109/lgrs.2012.2196019

Kato, T. (1992) Database architecture for content-based image retrieval, Image Storage and Retrieval Systems, https://doi.org/10.1117/12.58497

Nowosad, J. (2021). Motif: an open-source R tool for pattern-based spatial analysis. Landscape Ecology, https://doi.org/10.1007/s10980-020-01135-0

Nowosad, J., & Stepinski, T. F. (2022). Extended SLIC superpixels algorithm for applications to non-imagery geospatial rasters. International Journal of Applied Earth Observation and Geoinformation, https://doi.org/10.1016/j.jag.2022.102935

Omicum
15:30
15:30
30min
Coffee
Omicum
10:00
10:00
30min
Coffee
Omicum
12:30
12:30
90min
Lunch
Omicum
15:30
15:30
30min
Coffee
Omicum