The challenges of reproducibility for research based on geodata web services
07-16, 15:30–16:00 (Europe/Sarajevo), PA01

Digitalization and collaborative approach drives Open Science, the modern way of conducting research. In fact, Open Science can be defined as “a collaborative culture enabled by technology that empowers the open sharing of data, information, and knowledge within the scientific community and the wider public to accelerate scientific research and understanding”. Its three major objectives are: (a) increase the accessibility to the scientific body of knowledge, (b) increase the efficiency of the processes to share research outputs and findings, and (c) improve the evaluation of the science impact considering new metrics. Due to technological advances of the last decades, modern research is, today, mainly data-driven therefore Open Research Data (ORD), which refers to "the data underpinning scientific research results that has no restrictions on its access, enabling anyone to access it." [1], is extremely important.
With means to openly share data, the intent is to accelerate and boost new findings and innovations, minimizing data duplications and enabling interdisciplinary and wider collaborative research. To be effectively used by other researchers, ORD need to follow the specific principles of Findability, Accessibility, Interoperability, and Reuse (FAIR) [2] which led to the creation of data repositories that permits to register, store, find and access data following interoperable metadata standards. Available repositories offer services which generally adhere to ORD best practices by offering open data access, associating a license to data, making them persistent, providing unique citable identifiers (DOI), adopting repository standards, and providing a defined data policy. Nevertheless, in most of those repositories it is only possible to deposit static files, preferably archived using standard open formats and metadata.

However, to fully exploit ORD with modern applications, using for example AI techniques, big data requires specialized services that offer a systematic and regular delivery of Analysis Ready Data (ARD) and filtering capabilities [3]. Sharing ARD perfectly fit with the European vision of establishing Data Spaces as an interoperable digital place to facilitate data exchange and usage in a secured and controlled environment among different disciplines with the goal of boosting innovation, economic growth and digital transformation [4]. This concept goes beyond the simple technical data sharing issues and encompasses the need of offering a space to share data that is compliant with privacy and security regulation.

In the geospatial context, operational data sharing has been implemented by means of Spatial Data Infrastructures (SDIs). They have been implemented based on sharing principles which led to the adoption of interoperable geoservices by which today thousands of geospatial layers are offered to millions of applications worldwide adopting interoperable geostandards that are mainly from the Open Geospatial Consortium (OGC). The technological growth in the last decades led to the explosive increment of time-varying data which dynamically change to represent phenomena that grows, persists and decline, or that constantly vary due to data curation processes that periodically insert, update, or delete information related to data and metadata.

Therefore, based on the current trends, the ability to link Open Science concepts with interoperability and time-varying data management is paramount. In particular, the capability of obtaining results consistent with a prior study using the same materials, procedures, and conditions of analyses is very important since it increases scientific transparency, fosters a better understanding of the study, produces an increased impact of the research and ultimately reinforces the credibility of science. In the Open Science paradigm this is indicated as Reproducible Research, and it can be guaranteed only if the same source code, dataset, and configuration used in the study is persistently available. For geospatial data, while the presented OGC standards enable an almost FAIR [5] and modern data sharing, they do not adequately support the reproducibility concept as pursued in Open Science. In other words, they do not offer any guarantee that the geodata accessed in a given instant in a geoservice can be persistently accessed, immutably, in the future.

The needs and practices of time-varying data updates is supported by real case examples related to common operations that update data or metadata of the different geospatial data types, for example, specifically: environmental and climate data for sensor observations, cadastral and OSM data for vector datasets and satellite derived land cover, crop maps and observations of water for raster series. From a technical perspective, the capacity of accessing data as they were in a specific previous state is strictly linked to the capacity of supporting data versioning. Feature found in specific tools and approaches that largely differs from data formats and storage (databases, files and Log-Structured Tables) bur rarely support geospatial data.

While a defined approach to support system-time exists in SQL and LSTs [7] it is not yet currently adopted on commonly used storage solutions like those offered by the OSGeo’s projects [8] and/or are easy to integrate in them without new software development.
We conclude that OGC Geospatial web services that are currently used in Spatial Data Infrastructures do not meet the reproducibility requirement set by Open Science since they do not guarantee the immutable access to a dataset in its status at a specific time of consumption.
To support this capability, we propose that geospatial data management infrastructures manage datasets versioning and expose these features to users trough standard Web services. Since versions number may evolve extremely fast and are not meaningful to the user, the system time, which identifies the instant for which archived information had specific values, should be used, in conjunction with web service URL, as a unique identifier of the dataset. Finally, together with the support of versioning we propose to support the git-like metadata on user and motivation on data transactions: this would greatly support reproducibility and Open Science. In fact, it would not only allow us to retrieve temporal versions of the dataset, but it would also permit us to perform data lineage analysis to fully understand the historical changes and better comprehend the dataset (including data provenance and ownership) with the effect of fostering the transparency and, ultimately, the credibility of science.


Give indication of resources (video, web pages, papers, etc.) to read in advance, that will help get up to speed on advanced topics.

[1] Facts and Figures for Open Research Data Available online: https://research-and-innovation.ec.europa.eu/strategy/strategy-2020-2024/our-digital-future/open-science/open-science-monitor/facts-and-figures-open-research-data_en
[2] Wilkinson, M. D. et al., (2016). The FAIR Guiding Principles for scientific data management and stewardship. Scientific data, 3(1), 1-9.
[3] Chatenoux, B. et al. The Swiss Data Cube, Analysis Ready Data Archive Using Earth Obser-vations of Switzerland. Sci. Data 2021, 8, 295, doi:10.1038/s41597-021-01076-6.
[4] Farrell, E. et al. European Data Spaces - Scientific Insights into Data Sharing and Utilisation at Scale Available online: https://publications.jrc.ec.europa.eu/repository/handle/JRC129900
[5] Kedron, P.; Li, W.; Fotheringham, S.; Goodchild, M. Reproducibility and Replicability: Opportunities and Challenges for Geospatial Research. Int. J. Geogr. Inf. Sci. 2021, 35, 427–445, doi:10.1080/13658816.2020.1802032.

Select at least one general theme that best defines your proposal I make my conference contribution available under the CC BY 4.0 license. The conference contribution comprises the abstract, the text contribution for the conference proceedings, the presentation materials as well as the video recording and live transmission of the presentation – yes

Massimiliano Cannata is a Professor of Geomatics and Head of Research, Development, and Knowledge Transfer at the Department of Environment, Construction and Design (DACD) at SUPSI. He also leads the Open Science Competence Center at SUPSI, promoting open research practices and data sharing while coordinating its institutional adoption.

With a PhD in Geodesy and Geomatics and a background in Environmental Engineering from Politecnico di Milano, Massimiliano led the geomatics division at the Institute of Earth Sciences (SUPSI) from 2007 to 2024. He has been actively involved in the Open Source Geospatial community from its inception, serving as a former director of the Open Source Geospatial Foundation (OSGeo) and as a member of various project steering committees, including ZOO-Project and GRASS GIS (2006–2016). He led the development of istSOS, an open-source implementation of sensor observation services, and has co-chaired both the United Nations committee on geospatial open source and the OSGeo Open Geoscience committee. His work focuses on geospatial technologies, data interoperability, and open innovation in research.