FOSS4G 2022 general tracks

Paul van Genuchten

SDI specialist at ISRIC World Soil Information. PSC member of GeoNetwork and pygeoapi. Interest in SDI standards and open source.


Sessions

08-25
12:35
5min
A crawler for spatial (meta)data as a base for Mapserver configuration
Paul van Genuchten, Luis Calisto

At our institute we manage a lot of input data and model outcomes of soil data to be shared online. We experienced that updating service configurations and metadata records can be quite a challenge, when managed manually at various locations. We've been working on tooling to help us automate the publication processes. These days data publications are set up as CI-CD processes on Gitlab/Kubernetes.
These efforts resulted in a series of tools which we call the Python Data
Crawler. The crawler spiders a folder of files, extracts and creates metadata records for the spatial files, as well as generates a Mapserver configuration for the data to be published as OGC services. Underneath we're building on the tools provided by the amazing FOSS4G community, such as GDAL, Mapserver, pygeometa, owslib, mappyfile, rasterio and fiona.
A typical use case for this software is with many organizations maintaining a file structure of project files. The crawler would index all the (spatial) data files, register the metadata records in a catalogue and users would query the catalogue from QGIS Metasearch to find and load relevant data.
We will present our findings around the project at the conference and hope to talk to institutes with similar challenges, to see if we can create an open source software project around the Python Geodata Crawler.

Use cases & applications
Room 9