FOSS4G 2022 general tracks

Régis Haubourg

Passionate open source GIS data manager since 2005, I have been deeply involved in the OSGeo and QGIS project. I worked for Oslandia from 2016 to 2021. I decided to focus back on geodata management applied to energy transition and massive housing retrofit in 2021 at the Scientific and Technical Center for Building (CSTB). I was the previsou chairman of the French Osgeo local chapter and am still involved in the current board.


Building a common building's(!) open dataset using FOSS4G, open data and open governement.
Régis Haubourg

Climate change is here. heating, construction, cooling is estimated to contribute to 30% of the C02 emissions for France. And yet, we don't really have a database of those buildings. We have footprints by the French National Geographic institute, tax raising datasets on cadastral parcels, many derived datasets for energy consumption, performance certificates, but all of them are far away from a usable and centralized reference dataset.

The national adress geolocation (BAN) project unlocked the key pivot database between all them. The Scientific and Technical Center for Building (CSTB) a public industrial and commercial company, decided to dedicated efforts to build a permanent reference dataset, and push it as an open database.

The full stack is using open source technologies (Pandas / GeoPandas, to PostGIS, Apache Spark, MLflow, QGIS, MapLibre ...), and with massive datasets (21 Millions buildings, >400 descriptors). It allows to run analyses and predictions for all the climate change related indicators, such as housing price and energetic performance relation, heat wave impact, solar potential, etc..

As the first versions are now published, the next challenges are :
- make the data easier to reuse
- Push toward a official common identifier of each building, housing and parcels, through the BatID project and Etalab open government initiatives
- Enrich the dataset with new statistics and predictions twice a year
- Consolidate its economic rationales to make this viable on the long run

This talk will also show cool dataviz and geoviz stuff for geonerds audience :)

Use cases & applications
Room 4
How to deal with a massive geographic database when surrounded by datascientists ?
Régis Haubourg

The Scientific and Technical Center for Building (CSTB) built the first French database of buildings and houses to address climate change challenge, helping knowledge and decision making for massive retrofit.
The pipeline factory intersects massive datasets (21 Millions buildings, >400 descriptors) and keeps adding new predictions and external datasets all the time. It allows to run analyses and predictions for all the climate change related indicators, such as housing price and energetic performance relation, heat wave impact, solar potential, etc..
While the first versions where a direct image of the classical datascientist’s approach -ie a massive dataframe driven by massive yaml config files and cryptic meta-templated scripts– ease of use and access performance soon became a limiting factor. This is a major concern since this dataset will be one long term foundation of derived information systems.
Between brute force approach based on scaling resources up, and the old fashioned « data diet » normalization and optimization process, the truth is not easy to find.
Abusing from cartoonish humor, this talk will try to explore the benefits of normalizing back hugely redundant geographic datasets and making public interfaces (public SQL model, API’s, vector tiles, OGC API’s) so that both end users can analyze efficiently this dataset, and the data manager team can rely on more stability using those good old’ database constraints.

Use cases & applications
Room Limonaia