Marco Minghini
Dr. Marco Minghini obtained a BSc degree (2008), an MSc degree (2010) and a PhD degree (2014) in Environmental Engineering at Politecnico di Milano. From 2014 to 2018 he was a Postdoctoral Research Fellow at the GEOlab of Politecnico di Milano, Italy. Since 2018 he works as a Scientific Project Officer at the European Commission - Joint Research Centre (JRC) in Ispra, Italy, focusing on (geo)data interoperability, sharing and standardisation in support of European data spaces, and contributing to the operation and evolution of the INSPIRE infrastructure. He is an advocate of open source software and open data. OSGeo Charter Member since 2015, active OpenStreetMap contributor and Voting Member of the Humanitarian OpenStreetMap Team, Chair of ISPRS ICWG "Openness in Geospatial Science and Remote Sensing". He is a regular participant and presenter at global and local FOSS4G events, Secretary of FOSS4G Europe 2015, Co-chair of the Academic Track at FOSS4G 2022 and 2023 and FOSS4G Europe 2024.
Sessions
Building footprints (hereinafter buildings) represent key geospatial datasets for several applications, including city planning, demographic analyses, modelling energy production and consumption, disaster preparedness and response, and digital twins. Traditionally, buildings are produced by governmental organisations as part of their cartographic databases, with coverage ranging from local to national and licensing conditions being heterogeneous and not always open. This makes it challenging to derive open building datasets with a continental or global scale. Over the last decade, however, the unparalleled developments in the resolution of satellite imagery, artificial intelligence techniques and citizen engagement in geospatial data collection have enabled the birth of several building datasets available at least at a continental scale under open licenses.
In this work, we analyse four such open building datasets. The first is the building dataset extracted from the well-known OpenStreetMap (OSM, https://www.openstreetmap.org) crowdsourcing project, which creates and maintains a database of the whole world released under the Open Database License (ODbL). OSM buildings are typically derived from the digitalisation of high-resolution satellite imagery, and in some case from the import of other databases with ODbL-compatible licenses. The second dataset is EUBUCCO (https://eubucco.com), a pan-European building database produced by a research team at the Technical University Berlin by merging different input sources: governmental datasets when available and open, and OSM otherwise [1]. EUBUCCO is mostly licensed under the ODbL, with only exceptions for two regions in Italy and Czech Republic. The third dataset is Microsoft Open Building Footprints (MS, https://github.com/microsoft/GlobalMLBuildingFootprints), extracted through the application of machine learning technology from high-resolution Bing Maps satellite imagery between 2014 and 2023, available at the global scale and also licensed under the ODbL. The fourth dataset, called Digital Building Stock Model (DBSM), was produced by the Joint Research Centre (JRC) of the European Commission to support studies on energy-related purposes. It is an ODbL-licensed pan-European dataset produced from the hierarchical conflation of three input datasets: OSM, MS and the European Settlement Map [2].
The objective of this work is to compare the four datasets – which derive from different approaches following heterogeneous processing steps and governance rules – in terms of their geometry (i.e. attributes are out of scope) in order to draw conclusions on their similarity and differences. It is known from literature that building completeness in OSM (which plays a key role in three out of the four datasets – OSM itself, EUBUCCO and DBSM) varies with the degree of urbanisation [3] and that machine learning applied to satellite imagery (used in MS) may have different performance depending on the urban or rural context [4]. In light of this, we analyse the building datasets according to the degree of urbanisation of their location using the administrative boundaries provided by Eurostat, which classifies each European province as urban, semi-urban or rural (https://ec.europa.eu/eurostat/web/gisco/geodata/reference-data/administrative-units-statistical-units/countries).
We chose five European Union (EU) countries for the analysis: Malta (MT), Greece (EL), Belgium (BE), Denmark (DK) and Sweden (SE). The choice was motivated by the needs to: i) select countries of different size and geographical location, which ensure that their national OSM communities are substantially different; ii) select countries having different portions of urban, semi-urban and rural areas; and iii) select two sets of countries for which the input source for EUBUCCO buildings was a governmental dataset (BE, DK) and OSM (MT, EL, SE) to detect possibly different behaviours.
From the methodological point of view, for each country and degree of urbanisation we first calculated and compared the total number and total area of buildings in all datasets and we examined their statistics through box plots. This was followed by the calculation, for each couple of datasets and degree of urbanisation, of the building area of intersection and its fraction of the total building area of each of the two datasets. Finally, we intersected all the four datasets and calculated the fraction of the area of each dataset that this intersection represents.
Results show that in urban areas, while the datasets are overall similar in terms of total area of buildings, the total number of buildings is typically higher in EUBUCCO for DK and BE, where the information comes from governmental datasets. This suggests that such datasets outperform OSM in modelling the footprints of individual buildings in the most urbanised areas. In contrast, in semi-urban and rural areas, where OSM traditionally lacks completeness, MS (and as a consequence DBSM, which is also based on MS) captures more buildings. This is especially evident in SE, where 94% of the country area is not urban. When calculating the intersection between building areas for each couple of datasets in all countries and urban areas, the area of OSM buildings scores the lowest percentages of intersection when compared to the building areas of the other datasets. The lowest such percentages, equal to 25%, are scored when compared to MS in non-urban areas. EUBUCCO represents an obvious exception for the countries (MT, EL and SE) where it uses OSM. Finally, the dataset for which the area of intersection between the buildings of all the four datasets represents the largest percentage of the area is OSM, with values even higher than 80% for urban areas. This proves that EUBUCCO and even more DBSM can be considered a sort of ‘OSM extension’ improving its completeness. Instead, the lowest values are scored by MS and result from its radically different generation process compared to the other datasets.
The whole procedure was written in Python using libraries such as Pandas, Dask-GeoPandas and Plotly. The code is available under the European Union Public License (EUPL) v1.2 at https://github.com/eurogeoss/building-datasets in the form of Jupyter Notebooks. Work is ongoing to extend the analysis to the whole EU in order to validate the results of this study and formulate recommendations at the continental level.