A crawler for spatial (meta)data as a base for Mapserver configuration
At our institute we manage a lot of input data and model outcomes of soil data to be shared online. We experienced that updating service configurations and metadata records can be quite a challenge, when managed manually at various locations. We've been working on tooling to help us automate the publication processes. These days data publications are set up as CI-CD processes on Gitlab/Kubernetes.
These efforts resulted in a series of tools which we call the Python Data
Crawler. The crawler spiders a folder of files, extracts and creates metadata records for the spatial files, as well as generates a Mapserver configuration for the data to be published as OGC services. Underneath we're building on the tools provided by the amazing FOSS4G community, such as GDAL, Mapserver, pygeometa, owslib, mappyfile, rasterio and fiona.
A typical use case for this software is with many organizations maintaining a file structure of project files. The crawler would index all the (spatial) data files, register the metadata records in a catalogue and users would query the catalogue from QGIS Metasearch to find and load relevant data.
We will present our findings around the project at the conference and hope to talk to institutes with similar challenges, to see if we can create an open source software project around the Python Geodata Crawler.