Environmental changes can have different causes on local level (e.g. soil sealing) as well as on global level (e.g. climate change). To detect these changes and to find patterns in the reasons for them it is necessary to collect broad environmental data, temporally and spatially. Thereto citizens can play an essential role to collect the data (Goodchild, 2007). In particular, we developed a system which enables citizens to monitor the occurrence and distribution of birds and provides the collected data to the public in order that both researchers and citizens can derive conclusions from them. With our automated approach we want to support other citizen science solutions like eBird (Sullivan et al. 2014) where contributors manually report their sightings.
Therefore, we built a prototypical bird feeder equipped with several sensors and the infrastructure to process the data collected by the feeder.
The feeder is easy to reproduce at a reasonable price by following an open available manual. This allows anyone to build the feeder on their own, enabling a large distribution at many locations. The feeder automatically detects when a bird is visiting it, takes an image of the bird, determines the species and connects the observation with environmental data like the temperature or light intensity. All the collected data are published on a developed open access platform. Incorporating other surrounding factors like the proximity of the feeder station to the next forest or a large street allows it to pursue various questions regarding the occurrence of birds. One of them might ask, how does the immediate environment affect bird abundance? Or do sealed surfaces have a negative effect compared to a flowering garden?
The developed weatherproof bird feeder is attached with multiple sensors. Thereby the standard equipment includes a motion sensor to detect if a bird is currently visiting the feeder, a camera to take images of the birds, a balance to weigh the birds and a sensor to measure the environment's temperature and air pressure. In addition to the standard sensors, further sensors were tested with the prototype, which usefully supplement the monitoring but are not absolutely necessary for the operation of the station. Thus, a microphone is suited to record the voice of the birds or generally the surrounding noises. A brightness sensor can be valuable to draw conclusions whether birds visit the feeder in relation to light conditions, or a sensor to measure the air pollution (e.g. PM10) to investigate if the air quality influences the bird occurrence. Besides, the usual camera can be replaced by an infrared camera to capture animals which visit the feeder at night. Thus, the station is expandable and customizable depending on the individual use cases or research questions.
The environmental sensor data is continuously logged and sent to the open access platform, whereby the corresponding interval can be set by the user. Once the motion sensor detects a movement, the camera recording starts as well as scale and microphone start to store values. As long as the motion sensor detects movement, camera, microphone and balance are running. After the movement is finished, a light-weighted recognition model is used to check whether a bird is depicted in the images. If this is the case, all data collected during the movement, including the respective environmental data, will be sent as a package to the open access platform.
In order to process the data collected by the station, we have developed various methods and software for data storage, analysis and sharing. The data processing is done on a centralized server. Communication with this server is enabled through a RESTful API and a website. On the server created entities of the feeders can receive environmental data as well as movement packages. When movements are sent, the server analyzes the amount of birds and identifies the species with artificial intelligence. In addition to the storage, the server makes the data available to users in two ways. First, the data is downloadable as raw JSON via the API, which enables others to use it for their own research. Second, the data is presented nicely on our website, to make it easily inspectable for everyone. However, not only via our stations a upload to the server is possible, it is also open for the upload of data gathered by other systems. Further, it is also possible to upload images of birds and receive the represented species.
The feeder is designed so that it can be replicated by anyone. The corresponding instructions will be published shortly. The code to run the station and the server is available via GitHub (https://github.com/CountYourBirds).
Moreover, different options for the validation of the data, especially the species classification, are implemented. One step is the automatic validation by the sensor values or metadata. For instance if a standard camera recognizes a bird but currently it is night (detected by light sensor or time of the day) or the balance detects nothing, the observation is discarded. Further validation can come from actual people. An interface is provided which is used to show people values and especially images recorded with the automatically recognized species. The depicted data can be validated to find corrupt sensors and wipe out mistakes made within the image classification. Additionally, the serverside evaluation of the data is supplemented by a validation of the recognized species. It is checked whether it is possible that the species can occur at that geographic region or at that time of the year.
As next steps we want to conduct workshops with citizens and experts, both for putting together the stations as well as evaluating the data and the station itself. In general a strength of our implemented approach is that it is easily adaptable to other use cases, especially to detect other animals. For example with small adaptations to the feeder it could be used to detect or count different mammals like squirrels or for insects like butterflies and bees.
This work wants to highlight the results obtained during the BEEMS (Monitoring Bee Diversity in Natural System) project, which the main goal was to answer the following question: Which biotic and abiotic indicators of floral and nesting resources best reflect the diversity of bee species and community composition in the Israeli natural environment? To this end, the research was oriented towards the cost-effectiveness analysis of new aerial geomatics techniques and classical ground-based methods for collecting the indicators described above, based only on open-source software for data analysis.
The study involved the Israeli and Italian teams, focusing the attention on two complementary study systems in central Israel, the Alexander Stream National Park, an area undergoing an ecological restoration project in a sandy ecosystem, and the Judean foothills area, to the South of Tel Aviv. In each study system, different surveys of bees, flowers, nesting substrates and soil, using classical field measurement methods have been conducted. Simultaneously, an integrated aerophotogrammetric survey, acquiring different spectral responses of the land surface by means of Uncrewed Aerial Vehicle (UAV) imaging systems have been performed. The multispectral sensors have provided surface spectral response out of the visible spectrum, while the photogrammetric reconstruction has provided three-dimensional information. Thanks to Artificial Intelligence (AI) algorithms and the richness of the data acquired, a methodology for Land Cover Classification has been developed. The results obtained by ground surveys and advanced geomatics tools have been compared and overlapped. The results are promising and show a good fit between the two approaches, and high performance of the geomatics tools in providing valuable ecological data.
The acquisition of the indicators identified in the planning phase took place through several measurement campaigns conducted in the period between February 2020 and April 2020 located in two areas of interest in the Israeli territory. A total of 934 and 543 wild bees were collected in the two systems under study, respectively. From a geomatics point of view, 8 flights were carried out in the Alexander Stream National Park on 24 February 2020, acquiring approximately 65 GB of 8-bit multi-band images in tiff format. In the Judean foothills area, 11 flights were carried out on 26 February 2020, obtaining approximately 77 GB of tiff images. In addition, in order to obtain a correctly geo-referenced 3D model, a total of 54 Ground Control Points (GCPs) were acquired, of which 27 in Alexander Stream National Park and 27 in the Judean foothills, with a multi-frequency, multi-constellation GNSS geodetic receiver in RTK mode.
On the basis of the technical requirements necessary to carry out this project, very high-resolution digital maps (orthophotos, digital terrain models - DTM) were produced through the application and optimisation of photogrammetric and structure from motion (SfM) processes performed on data from different imaging sensors (RGB, multispectral), considering only open-source software. Therefore, considering all the previously defined aspects, in order to plan the data acquisition, the research group defined the flight parameters and instruments, both in terms of aircraft and sensors to be installed onboard, necessary to achieve the project objectives. All the digital cartography generated has been defined in the Israeli reference system, i.e. in WGS84 with UTM 36N cartographic projection. The results are shown in Tables 2 and 3 for the Alexander Stream National Park and the Judean foothills, respectively.
The production of very high scale digital cartography allowed the extraction of the necessary data for training the proposed Artificial Intelligence model. These data were applied to two different approaches for automatic land cover classification. The first approach was based on unsupervised classification at the pixel level, while the second approach is based on object classification, i.e. vector polygons describing the boundaries of a real object. The algorithms operate differently on these two types of data, in fact in the pixel-based approach they are applied at the level of the single pixel, while in the object-oriented approach they are applied to groups of homogenous pixels for a given feature. The implementation of all the training and validation phases of the proposed models was based on Python programming language using open libraries for data management (shapely, raster) and learning (sk-learn). The segmentation of the input data is fundamental in the approach in order to define the objects to be classified, therefore the Orpheo Toolbox library was applied. The object-oriented approach was applied for the Alexander Stream National Park site while the pixel-based approach was applied on the Judean Foothills area.
For pixel-based classification, a clusterization algorithm, KMeans, was used in an unsupervised manner. The KMeans algorithm clusters the data by attempting to separate the samples into n groups of equal variances, minimising a criterion known as within-cluster sum-of-squares. The algorithm was optimised through a trial-and-error procedure that led to the identification of initialisation parameters. For the object-oriented classification, we proceeded to apply automatic segmentation algorithms based on the analysis of multi-band spectral variability. In particular, the algorithm used is the Large-Scale Mean-Shift segmentation algorithm, which produces a clustered image in which the pixels around a target pixel that present similar behaviour from both the spatial and spectral points of view are grouped together. Then, the procedure vectorizes these clusters and the operator associates a label to each of them for the generation of the dataset. After subdividing the data into training and testing elements, the Random Forest algorithm was used for both approaches and proved to be the most effective in performing the assigned task. The classification results were carried out using different validation metrics such as Precision, Recall, F1 score, etc. that will be presented.
Large infrastructure building like the Florence Railway Station designed for high-speed rails requires a proper management of the huge quantity of waste originating from excavation activities. Such waste amounts require large areas for disposals, making abandoned areas or exhausted quarries and mines ideal sites for hosting the excavated wastes. A rectangular area of 500x70m delimiting the railway station has been excavated in two steps causing the removal of a 10m-thick soil layer per step: the amount of construction waste, as stated in the approved management project by public authorities involved in environmental management plans, would be used for the environmental restoration of an area of 400x350m located near a former exhausted lignite quarry) located in the proximity of the Santa Barbara village near Cavriglia (Arezzo).
The Tuscan Regional Environmental Agency (ARPAT) have been involved in monitoring both the terrain transportation and disposals’ operations according to the approved management plan: while the Environmental Evaluation Office (VIA-VAS) was responsible of the waste sampling for further chemical analysis to assess the acceptable waste chemical composition, the Environmental Regional Information System Office (SIRA) was asked to evaluate volume balancing between all the waste management cycle, with included: (a) waste extraction from railway station site building, and (b) waste disposal final destination (exhausted Santa Barbara lignite quarry).
A phase difference terrestrial LiDAR have been used in acquiring the 3D point cloud at the railway site at the following stages: (a) initial stage, before excavation activities’ starting (b) step 1 stage, after the first 10m-thick layer excavation (c) step 2 stage, after completion of excavation works. Various tests have been performed to assess the optimal number of scans allowing to obtain the required precision of the final 3D model, stating from more than 100 scans for the survey for the initial stage to about 50 scans used for (b) and (c) stage surveys. Each survey was referenced by using a local coordinate system materialized during the survey; each target was then referred to the mail local reference system used in the railway station project by the owner’s topographers with a total station.
Scan alignment and 3D cleaning (point clouds and meshes) was made using proprietary licensed software, while volume differences evaluation was made in QGIS 3.x environment; as for the scan alignment phases (3D point clouds’ alignment), available open-source platforms have been tested and evaluated. Both scan alignment and 3D cleaning, while manually executed, have been proven to be time-consuming operations even using proprietary-licensed sofware.
As for Santa Barbara quarry, an initial RTK RPAS was performed before grass and small vegetation removal to evaluate the potential of RPAS over the survey area in speeding survey activities with respect to the terrestrial LiDAR in open areas: the RPAS survey demonstrates that such technology, compared to terrestrial LiDAR surveys in open areas, is much less time consuming in both acquisition and processing time, making it the best choice for surveys in open areas where extreme precision (sub-centimetric) is not required.
Due to work progresses in filling activities at Santa Barbara site, i.e. the partial cleaning of one of the defined file subareas followed by its filling with excavated wastes, the initial stage of waste filling was surveyed in five times, one for each of the defined subareas. Each subarea survey, due to its limited dimensions (120x50m), instrumentation and personnel availability at the time of vegetation cleaning, have been surveyed with the terrestrial LiDAR, while for the final survey over the whole quarry area the RTK RPAS have been used. LiDAR surveys have been processed according to the tested methods in railway station surveys processing; RPAS RTK survey data, too, have been processed with the same proprietary software. Terrestrial LiDAR surveys were referenced in a local coordinate system by using a local coordinate system materialized during the survey; each target was then referred to the mail local reference system used in the quarry filling project by the owner’s topographers. The RPAS models, in geographic coordinates, were then aligned to the terrestrial LiDAR surveys in order to evaluate the global waste volume disposed onsite.
Comparison operations between excavated volume at the railway station site and the exhausted lignite site showed good agreement, even by taking into account a standard transformation coefficient between compact soil and excavated waste. Terrestrial LiDAR scan alignment and point clouds/mesh cleaning activities have been very time-consuming, so that usage of automatic processing pipelines testing by mean of open source software is in progress: environmental monitoring of waste management over large areas, if properly managed with (semi) automatic processing, would be less time-consuming stating to actual testing. National projects of large processing infrastructure (‘Mirror Copernicus’) would see a leading role taken by our office in building a fully-operational prototype of a pipeline for scan alignment and point cloud/mesh processing to evaluate waste extraction in large building sites.
Sea water turbidity is a measure of the amount of light scattered by particles in water. It is due to the presence of suspended particles, which it is operationally defined as the fraction in water with less than 2 µm in diameter. Plankton can also generate turbidity, but high turbidity events are dominated by high concentrations of inanimate inorganic particles. High levels of suspended sediments in coastal regions can occur as consequence of high sediment load from rivers, from bottom sediment resuspension due to wave actions or due to anthropogenic activities, such as dredging operations or bottom resuspension from ship propellants. The increase of turbidity can determine negative environmental effects both on the biotic and abiotic marine ecosystem. In highly anthropized coastal marine systems, like harbours, sediments represent a sink for contaminants and resuspension can contribute to propagate pollution to unpolluted areas (Lisi et al., 2019).
Many marine water quality monitoring programmes measure turbidity. Traditional methods (e.g., in situ monitoring) offer high accuracy but provide sparse information in space and time. Earth Observation (EO) techniques, on the other hand, have a potential to provide a comprehensive, fast and inexpensive monitoring system to observe the biophysical and biochemical conditions of water bodies (Caballero et al., 2018; Saberioon et al., 2020; Sagan et al., 2020). Hence, some of the authors are developing a semi-empirical model for predicting water turbidity by combining Sentinel-2A data and machine learning methods using samples collected along the North Tyrrhenian Sea (Italy). Field data collected at the study site from April 2015 to December 2020 were made available by ARPAL, even though most of these data refer to low turbidity events.
In the framework of this research activity, Sentinel-2A multispectral optical images, freely available within the EU Copernicus programme, are elaborated. It’s well known that such products are provided at Level-1C (L1C) Top of Atmosphere (TOA) and at Level-2A (L2A) Bottom-Of-Atmosphere (BOA). L2A BOA reflectance products are preferred as they are already corrected for effects of the atmosphere. However, the official L2A data are available for wider Europe from March 2018 onwards.
The necessity to use the complete on-site dataset to calibrate the predicting model, and not only data after March 2018, required the identification of the most appropriate algorithm for atmospheric correction of L1C images relative to study area between 2015 and 2018.
Hence, a comparison between the available L2A BOA product ant the corresponding L1C image corrected in different open source environment was performed. In particular, the free and open source QGIS and GRASS GIS, and the Sentinel Application Platform (SNAP), provided by ESA/ESRIN free of charge to the Earth Observation Community, published under the GPL license and with its sources code available on GitHub, were used.
Both image-based method, i.e. the Dark Object Subtraction (DOS) method in QGIS, and physically-based methods, i.e. the Second Simulation of Satellite Signal in the Solar Spectrum (6S) method in i.atcorr module of GRASS GIS and the Sen2Cor algorithm inside SNAP, were applied (Lantzanakis et al., 2017). The great advantage of the DOS method is that it focuses only on the spectral and radiometric characteristics of the processed image, hence it doesn’t require remote or in-situ atmospheric measurements. But the performed correction doesn’t seem so accurate. Instead, the physically-based approach requires atmospheric measurements and parameters, that are difficult to be identified so to be coherent in space and time with the processed image.
The most complex physical parameter to set is Aerosol Optical Depth (AOD), which is a dimensionless parameter related to the amount of aerosol in the vertical column of the atmosphere over the target station. It usually range from 0 to 1, with values less than 0,1 that corresponds to a clean atmosphere with high visibility, and values higher than 0,4 that corresponds to hazy atmosphere with very less visibility. AOD is spatially and temporally very variable. It can be estimated from AERONET (AErosol RObotic NETwork), a federation of ground-based remote sensing aerosol networks with more than 25 years of data. A station which measured the Aerosol Optical Depth at 500 nm at Level 2 (quality-assured) at the same time as the scene was taken, is not always available nearby the site under study. Hence the evaluation of AOD variability in time and space was analysed for the area and the events of interest, so to identify the proper values. Expecially i.atcorr seems very sensitive to the set values of AOD.
Once the proper method for atmospheric correction was identify, it was applied to the L1C images relative to the collected field data from April 2015 to March 2018. Then, the correlation between the in-site dataset and the individual bands known to be most sensitive to water turbidity, i.e. blue (B2), green (B3), red (B4) and near infrared (B8 and B8A) bands, was analysed, finding good results for the visible bands, and a weak correlation with NIR bands. In addition, indexes defined by the ratio between the three visible bands were checked to see which combination could best highlight the turbidity of the water from the Sentinel-2 images. Preliminary results seem to confirm that the identified EO technique could provide a fast and inexpensive monitoring system to observe sea water turbidity along the Northern Tyrrhenian Sea (Italy).
Nowadays, the need is felt to create a sustainable and inclusive urban environment accessible to all, which requires a people-centered urban planning approach. Along with alleviating environmental problems and minimizing traffic congestion, the public transit system serves as a means of providing equal access (Rossetti, et al., 2020). This paper attempts to re-evaluate the isochrones prepared to access public transport stops particularly transit nodes across Noida city using a GIS-based approach and Open Data Kit (ODK) approach. The isochrones predict the time to reach any area from a transport node like a transit station based on the shortest path model, however, not all roads and streets offer equal access to all (Lei & Church, 2010). In this study, macro-built-environment attributes responsible to increase pedestrian distance length and time to reach metro stations were identified using a GIS-based approach by integrating land-use and transportation data. However, the micro-built environment attributes like pedestrian behaviour, preference of travel modes, and purpose and frequency of transit trips made by the transit users were gathered by conducting metro station user surveys using the ODK app linked with its ODK aggregate server. The ongoing transport and urban planning methods hardly give any importance to the understanding origin and destination to reach important places with more ease and mobility (Bhatt & Minal, 2022). Urban researchers have not investigated any studies to evaluate equal accessibility to effectively and smoothly use public transit services by easily accessing transit stations (Yang, et al., 2019). The objective of this study is to map the pedestrian permeability and impermeability by categorizing roads and streets around identified transit nodes as public, private, and non-accessible by all. The prime function of accessibility is to link people with activities through linkages (Lei & Church, 2010). In this study, the travel modes particularly considered were on foot, non-motorized (cycle and rickshaw), shared e-rickshaw, bus service, and dropped off by two-wheeler and four-wheeler. A stratified random sampling technique was adopted to calculate the sample size of 12 existing elevated metro stations in Noida on the Blue transit line of DMRC. A self-administered questionnaire was used to conduct metro station surveys using ODK mobile app at identified 12 metro stations in Noida starting from Noida Sector 15 until the last Noida Electronic city (NEC) station. A sample size of 1% of the average transit ridership data, collected from DMRC for each station was taken to achieve a 95% confidence level. However, some stations like the Golf course and stations following Noida city center operational since the year 2019 have low ridership below 5000 persons per day. For these stations, 2% sample size of the average ridership data was considered to achieve a similar confidence level. Following this step, the existing land use encircling individual metro stations within a radius of 800 meters was demarcated from Noida Master Plan (NMP) 2031. Based on static master plan land-use distribution, the stations were categorized as residential, non-residential, mixed-use and transport hubs. The questionnaire contained 31 items which included various aspects including the usual purpose of metro trips made, employment status, availability of the driving license, household size, current city living in, number of cars available in the household, car availability during the transit trip made, building topology of the transit user, and number of floors. Most importantly, the preferred travel mode to reach the nearest metro station, frequency of trips made in a week, metro travel pattern changed in past six months, particularly due to COVID-19 restrictions, and the reason to opt for transit services. Other allied questions were distance and time to reach the metro station using different travel modes. The survey results were downloaded for the ODK aggregate server and converted to an excel sheet from CSV file format for its data analysis. In QGIS, a buffer distance of 800 meters in both directions was marked along the blue line transit corridor with its 12 metro stations in Noida. Initially, typically common walkable pedestrian routes terminating at individual metro stations were identified with their trip origin location in the nearby sector. Based on the farthest location found in all the directions encircling the metro station, walkable sheds were developed for all the 12 metro stations in Noida. However, as the stations are quite close in between 1 to 2 kilometers, the walkable sheds overlap for some consecutive metro stations. Thereafter, using Garmin Etrex 10, all the common routes, most commonly followed by e-rickshaws as an alternative to walking were traced. The photographs of the barriers found that typically increase the walking distance and time to reach the metro stations were clicked. Thereafter, using My Maps an app by Google, the Garmin tracked routes were fed along with photos imported through the Google Photos app. In the case of Noida, there exist both planned and unplanned barriers to equal accessibility for all. While, the gated communities, large super-blocks, and many Government housing societies are planned barriers that impede accessibility for all within TOD station areas. The presence of urban villages with organic street layouts and narrow incomplete streets are hardly accessible and force potential transit riders to shift away from walking or rely on e-rickshaws as short to medium-distance travel modes. Finally, the study proposes a TOD index based on the Space syntax model and distance measurement to categorize roads within the TOD area as public roads, private roads - accessible by only a few residents, non-suitable streets - not fit for use.
In the last decades the European mountain landscape, and in particular the Alpine landscape, has dramatically changed due to social and economic factors (Tattoni et al. 2017).
The most visible impact has been the reduction of the population for mid and high altitude villages and the shrinking of part of the land used for agriculture and grazing. The result is a progressive reduction of pastures and meadows and the expansion of the forested areas. Forest plots become also more compact, with the loss of ecotones.
The study of this phenomenon is important not only to assess its current impact on the ecological functionality of forest ecosystems including biodiversity and natural hazards, but also to build future scenarios, taking into account also the climate change issues. The limit of the mountain treeline is gradually shifting upwards and the monitoring and modeling of these changes will be crucial to plan future interventions and try to implement effective mitigation plans.
For these reasons, a dataset describing the forest, meadows and pasture coverage for the Trentino region, in the eastern Italian Alps, has been created.
A set of heterogeneous sources has been selected so that maps and images cover the longest possible time span on the whole Trentino region with the same quality, providing the necessary information to create a LULC (Land Use/Land Cover) map at least for the forest, meadows and pasture classes.
The dataset covers a time span of more than 160 years, with automatic or semi-automatic digitization of historical maps and the LULC classification from aerial images.
The first set of maps includes historical maps from 1859 to 1936, with an additional map from 1992 which was not available in digital format and has been digitized for this project: Austrian Cadastral (1859, 13297 sheets, scale 1:1440), Cesare Battisti’s map of forest density published in his atlas ”Il Trentino. Economic Statistical Illustration” (1915, single sheet, 1: 500 000), Italian Kingdom Forest Map (IKMF) (1936, 47, 1:100 000) and Map of the potential forest area and treeline (1992, 98, 1:50 000). A new procedure has been developed to automatically extract LULC classes from these maps, combining GRASS and R for the segmentation, classification and filtering with the Object Based Image Analysis (OBIA) approach. Two new GRASS modules used in this procedure have been created and made available as add-ons on the official repository (Gobbi et al., 2019)..
The second set of maps are aerial images, covering the time span from 1954 to 2015. The four sets which differ for mean scale, number of bands, resolution and datum: "Volo GAI" (1954, 130 images, mean scale 1:35 000, B/W, resolution 2m, Rome40 datum), "Volo Italia" (1994, 230, 1:10 000, B/W, 1m, Rome40), "Volo TerraItaly" (2006, 250, 1:5 000, RGB+IR, 0.5m, Rome40) and "Volo AGEA" (2015, 850, 1:5 000, RGB+IR, 0.2m, ETRS89). The "Volo GAI" imagery set has been ortho-rectified using GRASS, images in the other sets were already ortho photos.
The aerial images were classified with OBIA to create LULC maps, with particular focus on forest, meadows and pasture classes. The same training segments were used across the 4 sets and the custom classification procedure has been scripted. The number of training segments ranges from 1831 for the 2015 dataset and 2572 for the 1954 imagery set.
The evaluation of the results of the classification for all the maps and images has been carried out with a proportional stratified random sampling approach. A procedure has been scripted in GRASS to select 750 sampling points, distributed in each stratum (LULC class) proportionally to the area of the class. The resulting points have been manually labeled and used to assess the classification and filtering (where present) accuracy.[c]
For the historical maps, the application of the custom filtering procedure has increased the accuracy from a minimum value of 67% (for the IMF map) to 93% (for the same map), with a maximum of 98% for the cadaster map.
For the imagery datasets the accuracy (percentage of points correctly classified) was between 93% and 94%, with the latter value corresponding to the higher resolution 2015 imagery dataset. Higher accuracy, up to 95% was obtained for the forest class, which is the main focus of the study.
The analysis of selected landscape metrics provided preliminary results about the forest distribution and pattern of recolonization during the last 180 years.
A comparison between the capabilities of FOSS4G available systems for landscape metrics was performed to evaluate the best analysis tools (Zatelli et al. 2019).
Finally, these time series of LULC coverage were used to create future scenarios for the forest evolution in a test area of Trentino in the next 85 years, using both the Markov chain and the Agent Based Modeling approaches with GAMA (Taillandier et al. 2018).
Given the large number of maps involved, the great flexibility provided by FOSS for spatial analysis, such as GRASS, R, QGIS and GAMA and the possibility of scripting all the operations have played a pivotal role in the success both in the creation of the dataset and in the extraction and modeling of land use changes.
The development of new GRASS add-on modules, based on the scripts created during this study, is planned.
Mobility data, based on global positioning system (GPS) tracking, have been widely used in many areas. These include analyzing travel patterns, investigating transport safety and efficiency, and evaluating travel impacts. Transport Mode Detection (TMD) is an essential factor in understanding mobility within the transport system. A TMD model assigns a GPS point or a GPS trajectory to a particular transport mode based on the user's activity and medium of travel . However, the complexity of the prediction procedure increases with the number of modes that need to be predicted. For example, it is comparatively easy to predict whether a user is 'static' or 'slow moving' or 'fast moving' but it's hard to predict detailed transport modes such as walk, bike, car, bus, train, boat, etc. Therefore, this study proposes a multi-branch deep learning-based TMD model which can predict multi-class transport modes.
Two major challenges need to be addressed in order to generate a state-of-the-art deep learning model.
The first is to prepare ground-truth data. There are insufficient open-sourced ground-truth data available for transport modes in Japan. Hence, we proposed a transport mode label generation approach using snorkel . Snorkel is a weakly supervised labeling function, a first-of-its-kind system that enables users to train state-of-the-art models without hand labeling any training data. Instead, experts write labeling functions that express arbitrary heuristics based on the logic that can be drawn from understanding the data and the physical actions they represent. In this study, we used snorkel for generating the ground truth data for transport mode. Initially, we considered publicly available road networks, railway networks, bus routes, etc., for creating road, bus, train labels by overlaying GPS points on these transportation networks. However, there are multiple occasions where the road, bus, and train classes overlap each other, especially in a city region. Hence, we introduced a boolean (True/False) based soft-labeling function, where the same GPS point might have multiple True values for road or railway.
Second, we derived mobility-related features from the raw GPS data. Raw GPS raw data is typically composed of latitude, longitude, and timestamps. The raw GPS data were used to generate point-level features such as speed, speed difference, acceleration, acceleration difference, initial bearing, and bearing difference. Apart from that, we also generated trajectory level features such as average speed and average acceleration.
Transportation network-based soft-labeling and other mobility features are used to define labeling functions in the snorkel. These label functions are used to create true ground truths using a generative machine learning model with a portion of the GPS data. The generated labels (walk, cycle, bus, car, train, boat/ship) were then used to train the proposed deep learning model. To construct the model we opted to use two branches where raw GPS latitude and longitude values were used in one and the derived mobility features are used in the other. We used 3 fully-connected hidden layers for raw GPS data (lat/lon) and 4 fully connected hidden layers for mobility features. Features derived from the two branches are concatenated. Further, 3 fully connected hidden layers and softmax cross-entropy were used as a loss function. The proposed deep learning model has 108,614 trainable parameters and Adam is used as an optimizer. This particular two-branch model structure achieves better accuracy as it combines raw data as well as the derived mobility features in the network. An example of the benefit from this approach benefit can be the network's ability to relate GPS coordinates with road driving classes, thus inherently inferring that location as on a road. Note, many of these inferences that improve classification accuracy are possible via dramatically more advanced pre-processing to build out additional features. However, that approach is more time-consuming and could never catch all the potential inferences that an unbiased set of deep learning layers can inherently extract.
We evaluated the trained model's effectiveness in two ways. We compared the results against the popular XGBoost classifier, with our model producing over 5% higher accuracy for the benchmark Geolife dataset . Moreover, we collected smartphone-based GPS trajectories for multiple modes of transportation collected by testers in Bengaluru, India, and Tokyo, Japan. With this new absolute ground truth data, we compared the resulting predicted classes between operating system-provided activity classifications, the above XGBoost model, and our own. Our experiments show promising results with improved accuracy and increases in number of labeled data points. Of key note is that the iOS  and android in-built activity recognition tools provide the 'automotive' class as a single class, while our proposed model efficiently distinguishes automotive classes as car, bus, and train with improved accuracy. This work completely depends upon Free and Open Source Solutions (FOSS) for data preparation, mobility feature generation, deep learning model training, and big data computing. That includes various geospatial libraries such as geopandas, shapely, rtree, weakly label generation platform snorkel, deep learning platform tensorflow, keras, big-data computing platforms such as pyspark, hadoop, hive, etc.
AN OPEN-SOURCE MOBILE GEOSPATIAL PLATFORM FOR AGRICULTURAL LANDSCAPE MAPPING: A CASE STUDY OF WALL-TO-WALL FARM SYSTEMS MAPPING IN TONGA
Pacific Island Countries (PICs) such as Tonga rely on landscape services to support communities and livelihoods in particular smallholder and commercial agriculture. However, PICs are increasingly vulnerable to climatic and environmental shocks and stressors such as increasing cyclone occurrence and landscape conversion. Spatially explicit, timely, and accurate datasets on agricultural and other land use at the community scale are an important source of information for land use policy development, landscape management, disaster response and recovery, and climate-smart sustainable development. However, such datasets are not available or readily accessible to stakeholders engaged in landscape management in PICs. Household surveys, participatory GIS (PGIS), and remote sensing are approaches that have previously been used to capture community-scale landscape uses in PICs; however, these approaches are challenged by data collection and management burdens, mismatched scales, timely integration of databases and data streams, aligning system requirements with local needs, and various socio-technical issues associated with developing and deploying applications in new domains. Such data collection approaches only provide single time-steps representations of landscape uses and fail to capture the highly dynamic and spatially diverse nature of PIC landscapes.
We have addressed these challenges by developing, integrating, and deploying a tool for agricultural landscape monitoring at a local scale. This tool is composed of a stack of open-source geospatial applications and was developed through a collaboration between Tonga’s Ministry of Agriculture, Food, and Forests (MAFF) and researchers from Australian and South Pacific universities. We used a formal, iterative ICT for Development (ICT4D) framework to engage and co-develop the tool with MAFF and other landscape stakeholders including community leaders. The ICT4D framework is based on agile methods and is made up of five components: context analysis; needs assessment; use-case and requirements analysis; sustainability assessment; and development, testing and deploying. The five components provide a framework to ensure that project stakeholders (landscape managers, developers, and end-users) consider the range of technical and non-technical factors that will determine successful implementation of an ICT system in a new domain. Here, the goal was to transition from infrequent paper-based and non-spatial surveying of farms to develop a spatial data infrastructure that supports coordinated large-team farm mapping, data syncing and storage, and geospatial data analysis and reporting that aligns with MAFFs needs, and guides landscape management actions.
Here, we describe our team’s experience in applying the iterative ICT4D framework. We present the development activities associated with successive phases of the project and reflect on the advantages (and constraints) this framework offers for developing open-source geospatial applications for deployment in new domains with a low-resource context. Initially, we introduce the qualitative fact finding, context analysis, and needs assessment to ascertain and distil MAFF’s needs for geospatial data and applications. Then, we present several stages of application design, development, testing, and refinement in various MAFF data collection and reporting campaigns, which enabled analysis and the detailed specification of the requirements for the agricultural landscape monitoring tool. This includes work on developing initial prototype applications, implementing small-scale vanilla and land utilisation surveys, and finally an island-wide wall-to-wall crop survey with a large team of field data collectors.
Finally, we present the system architecture and a case study of the final iteration of the tool deployed for Tonga’s country-wide wall-to-wall farm system survey completed by MAFF in 2021. The final iteration of the tool was composed of a stack of open-source geospatial tools including QField for mobile mapping and data collection, QFieldCloud for user authentication and data syncing, and newly developed, open-source geospatial data visualisation, analysis, and reporting applications. This case study discusses: (1) how a team of over 40 data collectors were able to work collaboratively to build up a database comprising records from over 11,000 farms using QField and QFieldCloud; and (2) how custom applications developed in this project enable visualisation of this data on web maps and automated reporting to inform policy development and landscape decision making by MAFF. We also illustrate the critical role the tool and the crop survey information collected in 2021 played in assisting MAFF’s recovery efforts in the aftermath of the Hunga Tonga–Hunga Ha'apai submarine volcano explosion and subsequent tsunami which impacted heavily on Tonga’s main island of Tongatapu in January 2022. We also discuss the potential challenges in delivering the tool to other low-resource jurisdictions in the South Pacific including issues related to data dissemination, privacy and security; user management; technical and financial sustainability; scalability; training and knowledge transfer; and creating and fostering a community of open-source developers and users in PICs. The success of our case study demonstrates the importance of stakeholder engagement in an iterative ICT 4D development framework, and the great potential that open-source geospatial tools such as QGIS, QField, and QFieldSync can play in agricultural landscape management and disaster response in PICs.
The Architecture, Engineering and Construction (AEC) cluster in Portugal is a knowledge and competence aggregator platform in the construction sector, which aims to promote business competitiveness through research activities. The AEC sector is characterized by high levels of competence, being able to respond to the rigor and demands of international standards imposed by the world's leading industry leaders. This sector represents 2.2% of the Portuguese GDP (Gross Domestic Product) and aims to increase exports and qualified employment, promote international visibility, strengthen skills, promote the creation of partnerships and cooperation. It is of central importance for the national economy, being responsible for 350 thousand jobs, 19.9 billion euros in turnover and 4.5 billion euros in exports.
Currently, the AEC sector lacks tools for digitization, being forced to resort to proprietary software and closed file formats with complex and highly expensive licensing models. Thus, new opportunities arise for the creation of tools, resources and knowledge that promote the renewal and impetus of the entire sector. It is intended to develop a project that can respond to these needs, through the development of a 3D Digital Twin platform, a digital replica of a physical entity, for collaborative representation and editing of Industry Foundation Classes (IFC) documents, a format with a digital description of construction industry assets, applied to the Building Information Modelling (BIM) methodology, a methodology that aims to concentrate project information in a digital way in order to incorporate all its participants.
Presently, the AEC sector in Portugal is in need of products, processes and services that enhance collaboration, training and innovation in the sector to compete in the global market. Dependence on proprietary tools for the design of AEC projects, using closed file formats with complex and highly expensive licensing models, reveals an opportunity for new tools that can interact collaboratively based on standards driven by the AEC community itself. Thus, with the arising of the IFC interoperable format for the BIM methodology, it is necessary to develop universal access tools, capable of representing and editing 3D BIM models through a collaborative, interactive and real-time platform following international standards and specifications. These market opportunities, associated with the digital transition of the AEC sector, attract multiple stakeholders to this new reality, and produces a set of initiatives that foster competitiveness within the sector. However, the approach to these problems remains faithful to a matrix of classic solutions in this sector, in which the offer of products is based on highly specialized desktop software, and whose use of the IFC standards is only to guarantee operability with other systems, through the import/export of IFC files and not their direct handling. Hence, opportunities emerge for companies that seek to develop functionalities that permit handling IFC files through Web tools with universal access that allow responding to the requirements and needs of the AEC sector. What currently makes it difficult to create these Web applications, in addition to the various problems already mentioned, is the complexity associated with reading and writing IFC files, and performance issues related to Web applications, mainly caused by poor memory management when reading or writing large documents.
This article focuses on the development of an innovative web platform based on the concept of Digital Twin with 3D digital models, permanently updated, enhanced with information, which allows the knowledge of the back-office work reality, in real-time, and facilitates the decision-making process, including the constant adjustment of resource allocation and development of production processes under optimal conditions, and the overlap of the virtual domain with the physical domain.
The developed solution is a cross-platform web application that has all base requirements of a standard BIM platform, plus georeferencing of IFC documents using orthophoto maps and free access to base cartography. From this innovative web platform, in addition to the representation of 3D IFC files, analysis of metadata, and measuring elements of the 3D Model, these also stand out as technological breakthroughs of the sector: handling large files, georeferencing the 3D Model with map support, performing operations in the model for example adding elements from external catalogues, removing elements, updating elements geometry (rotation, scaling, translation) and metadata (properties and relations).
The present work sets itself as a part of a mobilizing project, called REV@CONSTRUCTION. REV@CONSTRUCTION is a project financed by Portugal 2020 that aims to develop solutions for a digital transformation of companies in the AEC sector, with the aim of promoting its competitiveness and sustainable growth, as well as strategic dissemination with the sector at European level.
This project established as its main goal, finding and providing digital solutions to the industry in the AEC sector, involving architects and designers, and construction and project management companies. Its R&D activities are directed to the development of digital tools linked to key aspects of the sector, but also to develop and provide the methodological bases of standardization, organization, and management of information necessary for the implementation of the BIM and Digital Twin methodology in Portugal, applied to the construction and asset management of buildings and infrastructures.
In this context, the focus is on defining bases of standardization across the sector, such as the establishment of construction technical information, information models and BIM object libraries, and a national cost database, which aim to eliminate the barrier between methodology and modelling tools and the production of standard specifications, thus leading to increased effectiveness and efficiency of construction processes and reduction of public context costs, by standardizing practices and procedures throughout the various stages of the construction process.
Through these integrative, cross-cutting, and structuring initiatives, the group of 22 entities involved in this project, from companies to universities or reference research institutes in Portugal, want to mobilize the AEC sector for the Digital Revolution. This project has an investment of 8.2 million euros being financed by Portugal 2020.
Information and communication technology (ICT) is mainly applied to finance, telecommunications, and public sectors. However, since the early 2010s, there have been efforts to apply ICT to various fields such as aerospace, life science, energy, and automobiles. Recently, artificial intelligence and big data technologies have also been applied in the aerospace field, among others. In the field of aerospace, earth observation attracts the most interest.
Recently, the number of satellites for earth observation is increasing every year. As small satellites can be manufactured at low cost, the number of small satellite constellation for temporal and spatial resolution is increasing. urban change detection, disaster monitoring, and traffic analysis are typical applications. These applications will be increasing more.
When performing earth observation using satellite images, it must process a large amount of satellite information in real time and analyze satellite images with artificial intelligence. These studies are being conducted in a variety of ways, and especially, the area of interest in this study is processing technology for storing and retrieving massive satellite images.
As a technology for handling massive satellite information, a multi-dimensional array database plays an important role. Representative examples based on open source software are Rasdaman and SciDB.
In 2018, we started developing KIWI-Sat, a system for processing and analyzing massive satellite information for especially supporting Korean satellite images such as KOMPSAT-2, KOMPSAT-3, KOMPSAT-5 etc.
KIWI-Sat supports GeoTiff, HDF 5, and JP2 which are main data types of representative satellite images. It is mainly being developed to support Korean satellite images KOMPSAT-2, KOMPSAT-3, KOMPSAT-3A, KOMPSAT-5, GOCI, etc. also, overseas satellite images such as Sentinel 1A, Sentinel 1B, Sentinel 2B, SPOT, and PlanetScope also be supported. Because KIWI-Sat supports main data type’s satellite images, Other satellite images can be easily supported.
KIWI-Sat is being developed using a number of open source software such as Rasdaman, Pytorch, Django, Mapbox etc. First, it uses Rasdaman to process massive Raster-based satellite information, Pytorch to process AI inference module, and Django and Mapbox for visualizing Satellite images and overlay etc. KIWI-Sat is mainly composed of five subsystems: 1) 'K-SDA(KIWI-Sat Data Access)', which processes satellite information based on Rasdaman, an array database, 2) 'K-SAA(KIWI-Sat AI Analysis)', which is in charge of AI-based satellite image analysis, 3) 'K-SVI(KIWI-Sat Visualization)' that visualizes satellite image based on map, 4) 'K-SUT(KIWI-Sat Utility)' that provides utility functions such as system resource monitoring and upload/download of original satellite image, 5) OpenAPI for satellite information access There is a 'K-SOA(KIWI-Sat OpenApi)' in charge.
In this paper, KIWI-Sat will be mainly explained focusing on 'K-SDA', which processes massive satellite images, and 'K-SAA', which analyzes satellite images with artificial intelligence.
First, K-SDA was developed using Rasdaman. In the early of development K-SDA, SciDB was used as database system, but as SciDB finished its open source policy. We adopted Rasdaman, a representative open source, as alternative database for processing satellite images. For this, function and performance comparison was performed, and Rasdaman has a good performance in the uploading original satellite to storing in Database. The search performance for the selected area was excellent in SciDB. Considering the real-time performance of recent satellite image analysis, it was confirmed that Rasdaman has an advantage. In processing satellite images, the multidimensional array database can handle storage and retrieval of array units with theSQL standard, and has many advantages as it supports massive raster data.
Second, it is 'K-SAA' that is linked to the satellite image-based AI module. KIWI-Sat has a structure that can be easily linked with the inference code and parameters of the previously developed artificial intelligence module. It works with the 'K-SVI' stage by receiving the satellite image obtained from 'K-SDA' and transmitting the obtained result to the DJango web framework by executing the inference code.
We installed a demo system at KARI to test and improve Kiwi-sat. KIWI-Sat was installed in Ubunt 20.04 LTS, and it stores KOMPSAT-2, KOMPSAT-3, KOMPSAT-3A, KOMPSAT-5, Planetscope satellite images to search for ROI for optical images and SAR images, AI interlocking, etc. Several tests are being carried out.
In this paper, we demonstrate the result of interworking of the KIWI-Sat and two AI technologies. First, for the KOMPSAT-3 satellite image, which is an optical image, the object detection result was compared to the object detection result to which the satellite image super-resolution technology was not applied and the object detection result to which the super-resolution technology was applied. This result shows that it is confirmed that the super-resolution technology can improve the object detection performance of satellite images. The test satellite image is KOMPSAT-3, and the area is Hong Kong.
The second shows the results of detecting the water system in the KOMPSAT-5 satellite image, which is a SAR satellite image.
So far, we have looked at KIWI-Sat, a large-capacity satellite image processing and analysis system. KIWI-Sat is currently under development. It plans to develop technologies for searching between multiple satellite images and extending satellite images to multiple nodes.
Synthetic Aperture Radar (SAR) backscatter is adept in differentiating standing water, due to its low signal, compared to most non-water surface cover types. However, the temporal transition from non-water to water is critical to identifying floods. Hence objects with permanent or seasonally low backscatter become ambiguous and difficult to classify. TU Wien's flood mapping algorithm utilizes a pixel-wise harmonic model derived from SAR datacube (DC) (Bauer-Marschallinger et al., in review) to account for these patterns. Designed to be applied globally in near real-time, our method applies Bayes inference on SAR data in VV polarization. In this method, the harmonic model generates the non-flooded reference distribution, which we then compare against flooded distribution to delineate floods within incoming Sentinel-1 IW GRDH scenes.
In the harmonic modeling, we estimate each location's expected temporal backscatter variation, explained by a set of Fourier coefficients. Following recommendations in the literature, a seven coefficient formulation was adopted (Schlaffer et al.,2015) and is here on referred to as our harmonic parameters (HPARs). The HPARs include the backscatter mean and three iterations of two sinusoidal coefficients. This model acts as a smoothened proxy for the measurements in the time series, thus allowing for a seasonally varying backscatter reference to be estimated for any given day-of-year
However, generating the harmonic model at a global scale and with high resolution presents significant logistical and technical challenges. Therefore, harmonic modeling of remotely sensed time series is often performed on specialized infrastructures (Liu et al., 2020), such as Google Earth Engine (GEE) (Gorelick et al., 2017) or other highly customized setups (Zhou et al., 2021), where the pixel-wise analysis of multi-year data requires well-defined I/O, data chunking, and parallelization strategies to generate the HPARs in reasonable time and cost. While harmonic analysis is not new, to our knowledge, production and application at a global scale using dense SAR time series have yet to be implemented, let alone operationally utilized.
To prepare for global near real-time flood mapping effort, HPARs were systematically computed using a global DC organizing Sentinel-1 IW GRDH datasets. In the DC structure, individual images are stacked, allowing for data abstraction in the spatial and temporal dimensions, making it ideal for time-series analysis. However, for this abstraction to be realized, a rich set of software solutions is needed to implement the 3-dimensional data model.
In this contribution, we present our SAR DC software stack and its utilization to compute the aforementioned global harmonic parameters. We show a set of portable and loosely coupled Python packages developed by the TU Wien GEO Microwave Remote Sensing (MRS) group capable of forming a global data cube with minimal overhead from individual satellite images. The stack includes, among others, open-source packages for:
high-level data cube abstraction - yeoda, spatial reference and hierarchical tiling system - Equi7Grid, lower-level data access and I/O – veranda, spatial file and folder-based naming and handling – geopathfinder, and product tagging and metadata management – medali.
The detailed description of the preprocessing and storage infrastructure used for this global DC is outlined by Wagner et al., 2021. Here, we focus on the software interfaces. Moreover, given the preprocessed datasets, the logical entry point is through yeoda, which abstracts well-structured Earth observation data collections as a DC, making high-level operations such as filtering and data loading possible. This level of abstraction is supported by the other components of the software stack, which addresses the organization and lower-level access to the individual files.
In a nutshell, the DC is simply a collection of raster datasets in GeoTIFF file format co-registered in the same reference grid. To deploy for large-scale operations, a well-defined grid system is required to deal with high-resolution raster data. A tiling system fulfilling this requirement is the Equi7grid, based on seven equidistant continental projections found to minimize raster image oversampling. Interacting with this tiling system on an abstract level is possible via our in-house developed Equi7Grid package. The tiling system follows a hierarchy of directories to manage the datasets on disk. Moreover, for individual files, a predefined naming convention is applied to indicate spatial, temporal, and ancillary information from product metadata that becomes transparent to yeoda. This setup of customizable file naming schemes is easily managed through the geopathfinder package.
The actual HPARs processing task was subdivided into multiple High-Performance Computing (HPC) jobs on the Vienna Scientific Cluster (VSC) based on this tiling hierarchy. For the temporal domain modeling to work, data is further split into manageable chunks. Thus only one tile per HPC node was allocated. Hence, yeoda was used to filter the DC down to the tile level, which was further reduced to a two-year period. From there, a three-dimensional array formatted backscatter measurements were generated by veranda from the image stack on disk.
Due to the depth of the DC, further segmentation and parallelization were required at this level. Thus, pixel-based parallelization was done using Numba to handle the core least squares estimation of the measurements versus a day-of-year array derived from image timestamps. Veranda is again used for the output operation to encode and write the HPARs to individual files. Data quality checks, and metadata encoding, done via medali, cap the processing. In this manner, the HPAR product themselves can be abstracted similarly as a DC and simplify subsequent flood mapping computations.
With the HPAR product, the Sentinel-1 time series is seasonally modeled and condensed to a fraction of the size of the original global DC. While for now, it is exclusively used to allow our flood monitoring workflow to work globally in near-real-time, other potential applications include seasonal water and vegetation analysis. Moreover, with the software stack used to compute and subsequently access, this product can easily be deployed on different platforms with little to no overhead, allowing reproducible DC analysis.
Moving towards SaaS solutions usually requires a provider that puts software on the cloud and a channel, usually a web portal, for accessing data and tools. The R CRAN programming environment has all the “ingredients” that are needed to create such SaaS in a local machine or on a server. We propose and discuss here a solution, called InforSAT, that was created ad hoc for centralizing satellite imagery processing, taking advantage of a remote server with multiple processors and thus also parallel processing solutions. The R Shiny package was used for connecting online widgets for user interaction with R tools for specific processing of imagery that is done via other specific packages. To this date only Sentinel-2 Level 2C data are considered, but the system is scalable to other sensors and processing levels. The tools that are available to this day are focused on multi-temporal analysis, to support the academic community involved in particular in vegetation analysis, whose phenology has notable changes inter- and intra-annually. The tools are available via a web portal to reach research teams that are not so familiar to satellite image analysis, to allow simplified extraction of multi-temporal data from Sentinel-2 images. Figure 1 shows the interface and figure 2 the result of extracting a boxplot of vegetation index values over a specific time window.
All image data are stored in a user-defined folder on the server, and a script checks weekly (or at other user-defined intervals) for new Sentinel-2 images and automatically downloads them and stores metadata in an R list structure. The metadata stores image paths, bands and also histograms of values for each band, to use for defining color-stretching parameters during image rendering on the browser. Regarding visualization, users can render real-color and false-color composites defining their own band combinations, and can also create and raster layer with the values of common vegetation indices or define their own index by providing an equation on the interface (see Figure 1). The images to be rendered on the user browser are processed on-the-fly from the original JPEG2000 format, also for calculating the index raster and the color-composites. Each index raster is calculated every time the user re-draws actively the raster, by sampling the original image with points that correspond to the screen pixels, reprojected from screen coordinates to image coordinates. Depending on the screen size and on the area, these are around one million points, that are then converted to an image and rendered on screen with a fixed scale that depends on the expected minimum and maximum values of the index (e.g. for the normalized vegetation index that would be between -1 and 1) or a scale that automatically stretches between the 10th and 90th percentile of the frequency distribution of the real values. The color-composites are automatically drawn at any scale using the intrinsic overviews for each Sentinel-2 band that are present from the JPEG2000 format. Regarding multi-temporal analysis, users can define one ore more polygons over the area and for each polygon extract single pixel values (digital numbers – DN) and aggregated zonal statistics for each and all available images in a few seconds, with or without using parallel processing mode. Users can download the multi-temporal data, i.e. the DN values, in table format for further analysis. The table is in long format and has a column with a timestamp, one with polygon ID and one column for each band with values. In both visualization and multi-temporal analysis, users can decide a threshold for masking according to cloud and snow probability, which are available products from the sen2cor processing of Sentinel-2 to level 2C. In the near future this solution will be integrated in an R package, allowing users to easily download, install and replicate their own portal locally or in their own server. Code is available on Github at https://github.com/fpirotti/inforsat
Approach and concepts
3D-georeferenced historical pictures have a high potential for the analysis of different landscape features such as melting glaciers, the effects of urbanization or natural hazards. Moreover, historical pictures have a higher temporal and spatial resolution than satellite imagery and therefore allow for analyses that go farther back in time. A 3D georeferenced picture can for instance be combined with a digital terrain model (DTM) and other reference data to calculate the exact footprint of the picture and to generate a list of visible toponyms that can be used to find pictures of a specific place or region.
The utilization of historical pictures is unfortunately still difficult: 1. historical pictures need to be digitized 2. collections are often spread across several places in different archives and collections 3. metadata is often not available. In the ongoing open-source project Smapshot (Produit et al. (2018), https://smapshot.heig-vd.ch/) over 150’000 digitized historical pictures have been georeferenced in 3D by more than 700 participants. In the web-platform Smapshot a participant can georeference a picture using monoplotting (Bozzini et al, 2012): ground-control-points (GCP) are digitized both in the historical picture and in a virtual globe that displays recently updated data. These GCP allow for the calculation of the exact position from where the picture has been taken (3D point) and the three angles that define the direction of view: roll, pitch and yaw. Once the position and the direction of view has been calculated a footprint of the picture is generated using a DTM.
In order to make the pictures and the metadata from Smapshot available to the public, an open API for 3D-georeferenced historical pictures has been created. The goal was to offer free access to all the data in the Smapshot database and to allow for different types of queries such as retrieving the footprints of the photos, fetching metadata for a picture (e.g. owner, title, date, x/y/z position and roll, pitch, yaw angles) or retrieving photos that are within a certain range from a specific point.
This API was built in NodeJS (https://nodejs.org/) with a PostgreSQL/PostGIS (https://www.postgresql.org/, https://postgis.net/) database and python code for the georeferencing algorithm. The API is a REST API fully documented using the OpenAPI specification. The API project has been open-sourced and specific test-suites have been put in place to ensure quality and to allow community contribution with confidence.
One challenge for the establishment of this API was standardization: Today there are several standards for the definition of metadata in pictures such as the IIIF (https://iiif.io/) or the Dublin-Core (https://www.dublincore.org/) standards. These standards however have limited support for geospatial data. On the other hand, spatial standards poorly support pictures that are oriented in 3D. The glTF standard (https://www.khronos.org/gltf/) is one example and there is also a recent initiative from the OGC called GeoPose (https://www.ogc.org/projects/groups/geoposeswg) which formalizes a standard to define a 6DoF pose everywhere on Earth including a position and orientation in 3D.
Reasons why it should be considered
3D georeferenced images are increasingly used by several projects that document change over time; e.g. within the field of digital humanities even paintings can be considered for 3D georeferencing and differences between the real world and the painted world can give room for analysis and interpretation. Another use-case is the creation of geovisualization-applications that show the contents of historical pictures in 3D and that enable a user to compare its contents to the real world (e.g. augmented or virtual reality applications)
Furthermore in the context of climate change, pictures and paintings document change and deliver evidence. Image processing techniques can be used to automatically detect features (machine learning) and if several pictures are available for one region (but taken from slightly different viewpoints) 3D features can be generated.
The open API for 3D georeferenced historical pictures makes these types of analyses easier and opens up the data for a larger public. It also becomes possible to implement other solutions that utilize the data directly - e.g. for displaying historical pictures in a 3rd party web page or for implementing machine-learning processes that automatically download pictures and metadata in order to recognize features and places.
The results of the project are also an important input for standardization activities that aim at establishing standards in the context of georeferenced pictures and their metadata.
An important perspective of the project is the establishment of an infrastructure for 3D-georeferenced pictures that can be deployed on a national or international level and that also offers the possibility to push new data (e.g. pictures) in the database.
This work is of interest for researchers who want to utilize and analyze 3D georeferenced historical imagery and for people who want to establish open API’s to give access to data that is relevant for research.
Bozzini, C., Conedera, M., Krebs, P., 2012. A new monoplotting tool to extract georeferenced vector data and orthorectified raster data from oblique non-metric photographs. International Journal of Heritage in the Digital Era 1 (3), 499–518.
Produit, T., Blanc, N., Composto, S., Ingensand, J., Furhoff, L, 2018. Crowdsourcing the georeferencing of historical pictures. Proceedings of the Free and Open Source Software for Geospatial (FOSS4G) Conference. Guimarães, Portugal. 2018-07
Source code : https://github.com/MediaComem/smapshot-api
The collection of georeferenced information on the field has become an established and popular practice allowing professionals, volunteers and citizens to contribute to mapping objects or reporting events. Field data collection is essential to a variety of domains  including many scientific and humanistic disciplines, humanitarian and rescues operations, locations reviews and professional engineering surveys, to mention a few.
The spread of mobile devices that can record location coordinates, media and features while on the go (and share them through the web) is primarily accountable for such diffusion. As a result, a number of mobile apps and software frameworks (both proprietary as well as free and open-source) have been developed and released to perform data collection on the field. Most of these frameworks allow developers or data collection promoters to customize collection forms according to the characteristics of each collection task and manage both users and records through web dashboards or database management systems. From the user perspective, mobile client apps are available to access selectively the collection forms and contribute to the data collection on the field using mainly smartphones or tables. Focusing on general-purpose data collection software frameworks, some of the most popular free and open-source solutions are the Open Data Kit (ODK, https://opendatakit.org), the KoBoToolbox (https://www.kobotoolbox.org) and Epicollect (https://five.epicollect.net). Other relevant examples of free and open-source frameworks implementing a more technical approach to field data collection are e.g. QField (https://docs.qfield.org), Geopaparazzi and SMASH (https://www.geopaparazzi.org). Proprietary or pay-per-use solutions developed by major GIS firms are also available on the market but they were not considered in the benchmark analysis carried out in this work.
The outlined free and open-source software frameworks provide client and server modules and both web and mobile apps to support the full development of field data collection projects . From the developer (or data collection promoter) perspective, the adoption of such frameworks is facilitated by the availability of open APIs, interfaces and dashboards to generate, deploy and manage collection forms, users and records. Nevertheless, limitations connected to the final user experience are common to most of them. On one hand, mobile client apps are not always available or optimized for all mobile OS, therefore preventing their use on the field by a significant number of potential contributors . This is the case e.g. of the ODK on iOS devices. On the other hand, each of these frameworks requires the installation of a specific mobile app on the user's device. This operation may not represent a significant obstacle to the contribution to specific data collection projects by very active or committed users. However, it might inhibit the contribution of occasional users who may not be willing to install additional software on their device for sporadic mapping of objects or event reporting .
In view of the above, this work presents the Geo Collector Bot, an alternative free and open-source software toolkit to empower field data collection projects avoiding the development and/or the installation of a specific mobile app on contributors' devices. The Geo Collector Bot is a configurable Telegram-based chatbot enabling the dispatching of data collection forms that can be activated and filled through Telegram chats. It consists of a backend application written in Typescript and running on Node.js. As the supporting mobile client, the Telegram app is exploited thus enabling a large number of users to contribute, even sporadically, to data collection projects (potentially every Telegram user; 550 million monthly active users as of July 2021). The Geo Collector Bot is released under MIT License and source code, documentation and a demo are available on GitLab (https://gitlab.com/geolab.como/geocollectorbot).
The Geo Collector Bot works as a standard Telegram Bot. To collect the data, the Bot asks a series of questions to the user including location coordinates, media, textual annotations, multiple-choice checkboxes, etc. and persists the answers to a spatial database. The questions flow can be customized by editing a JSON configuration file. Local deployments of the system are facilitated by the provision of a Docker container (https://hub.docker.com/r/geolabpolimi/geo-collector-bot).
The Geo Collector Bot has been developed in the framework of the INSUBRI.PARKS project (https://insubriparksturismo.eu), funded by the Interreg Co-operation Programme 2014-2020. This project aims at increasing the tourism attractiveness of the Insubria region (between Northern Italy and Southern Switzerland) through the provision of infrastructure as well as integrated marketing and management strategies for the Insubria natural parks. The Bot represents a component of the virtual infrastructure supporting the project. It was originally designed to allow both parks visitors and managers to easily collect and share geolocated records on parks status and feedback on points of interest. However, the ultimate goal of the presented work is to provide an open and general-purpose data collection software framework suitable for multi-purpose applications.
The current version of the Geo Collector Bot still does not provide dedicated backend supporting modules for both collection tasks and records management. To that end, the development of a web control dashboard is planned and it will be included in the stack of the Geo Collector Bot Docker container as an auxiliary component. The Bot has been tested using a PostgreSQL-PostGIS database. Additional configuration options to plug other spatial database systems is planned as well in the future development of this work.
As geospatial data continuously grows in complexity and size, the application of Machine Learning and Data Mining techniques to geospatial analysis is increasingly more essential to solve real-world problems. Although, in the last two decades, the research in this field produced innovative methodologies, they are usually applied to specific situations and not automatized for general use. Therefore, both generalization and integration of these methods with Geographic Information Systems (GIS) are necessary to support researchers and organizations in data exploration, pattern recognition, and prediction in the various applications of geospatial data. The lack of machine learning tools in GIS is especially clear for what concerns unsupervised learning and clustering. The most used clustering plugins in QGIS  contain few functionalities beyond the basic application of a clustering algorithm.
In this work we present Cluster Analysis, a Python plugin that we developed for the open-source software QGIS and offers functionalities for the entire clustering process: from (i) pre-processing, to (ii) feature selection and clustering, and finally (iii) cluster evaluation. Our tool provides different improvements from the current solutions available in QGIS, but also in other widespread GIS software. The expanded features provided by the plugin allow the users to deal with some of the most challenging problems of geospatial data, such as high dimensional space, poor quality of data, and large size of data.
In particular, the plugin is composed of three main sections:
feature cleaning: This part aims to provide some options to reduce the dimensionality of the dataset by removing the attributes that are most likely bad for the clustering process. This is important to achieve better results and faster execution time, avoiding the problems of clustering in high dimensionality. The first filter removes the features that are correlated above a user-defined threshold, since highly correlated features usually provide redundant information and can lead to overweight of some characteristics. The other two filters identify the attributes with constant values for all the data points or with few outliers differentiating from them. These types of features don’t provide any valuable information and can worsen the performance of clustering. To identify quasi-constant features, we use two different parameters introduced in the function NearZeroVar() from the Caret package developed for R : the ratio between the two most frequent values and the number of unique values relative to the number of samples.
clustering: This section is used to perform clustering on the chosen vector layer. First of all, the user needs to select the features to use in the process. It is possible to select the features both manually and automatically. The automatic feature selection is done using an entropy-based algorithm  presented in two versions with different computational complexities. The currently available algorithms for clustering are K-Means and Agglomerative Hierarchical, and the users can select the one that best suits their needs. Before performing clustering, the plugin offers the possibility to scale the datasets with standardization or normalization, and to plot two different graphs to facilitate the choice of the number of clusters.
evaluation: In this section we show all the experiments carried out in the current session, with a recap of the settings and performances of the experiments and the possibility to save and load them with text files. To evaluate the quality of the experiments we calculate two indexes and the comparisons among experiments on the same dataset. The indexes are the internal metrics Silhouette coefficient and Davies-Bouldin index. To directly compare the clusters formed by two or more experiments we compute the score , which evaluates how many couples of data points are grouped together in all of the experiments or in none of them. Every experiment completed in the current session can be stored in a text file, and the experiments saved in previous sessions can be loaded in the plugin and are shown in the evaluation section along with the other ones.
One of the major challenges during development has been allowing most of the functionalities on large datasets as well, both from the point of view of the number of samples and the number of dimensions. To achieve this, we also implemented algorithm options with good time complexities, as in the case of entropy with sampling and K-Means. Moreover, for all the data storage and manipulation done in the system, we use the data structures and functions provided by the libraries pandas and NumPy to guarantee high performance.
Another important objective of the research is the accessibility and ease of use of the plugin since the general user of GIS is often lacking a machine learning and computer science background. To guarantee this, the User Interface is simple and self-explanatory, and each section contains a brief guide to explain all the functionalities. Furthermore, some algorithm parameters that cannot be modified via the interface are stored in an external configuration file, and can be modified via this. This is done to avoid confusing the less experienced users.
Along with the implementation, the research is integrated with a considerable experimental phase, both during and after the development phase. This phase is essential to highlight both the potential of the plugin and its limitations in real-world scenarios. The great volume of experiments is conducted on data about the city of Milan, describing social-demographics, urban and climatic characteristics and with different granularities (ranging from less than 100 data points to almost 70000, and with a large number of numerical attributes, up to 109). Overall, the experimental phase shows good and adequate flexibility of the plugin, and outlines the possibilities for future developments that can be provided also by the QGIS community, given the open-source nature of the project.
The stable version of the plugin is available on the QGIS Python Plugins Repository (https://plugins.qgis.org/plugins/Cluster-Analysis-plugin-main/) while the development version as well as documentation are available on GitHub (https://github.com/folini96/Cluster-Analysis-plugin).
Japan's open infrastructure map development using OpenStreetMap was triggered by the Great East Japan Earthquake in 2011, which led to a widespread understanding of the activity, and by the end of September 2019, more than 35,000 unique users had made some kind of contribution, and the data is still being updated daily. The data is still being updated daily. In addition, the Mapillary project (Juhász and Hochmair, 2016; Mahabir et al., 2020) which started in April 2014, is a location-based landscape photo-sharing service that, like OSM, is crowdsourced and allows users to post photos of places around the world, not just on roads.
This activity has started to spread in Asia, especially in Japan, where the number of contributors and the number of photos taken is rapidly increasing (Ma et al., 2020). These voluntary crowdsourcing activities are a great incentive to work on the creation of micro-scale road data, especially those that cannot be maintained or updated by public agencies. On the other hand, most of the research on Mapillary to date has been concerned with technical methodologies, such as the study of ground object extraction based on deep learning of images using Mapillary data, and approaches such as local comparison of data generated by contributors, as is commonly done in OSM research, have not made much progress. This study was conducted in September 2014.
In this study, we obtained about 41.7 million log data through the Search Images API of Mapillary API ver3 taken in Japan from September 2014 to September 2019. Then, together with the line data of OSM roads at the same point in time, the maintenance status of Mapillary and OSM road data in municipal units in Japan was spatially analyzed mainly with QGIS, considering the time series and user trends. The data for the entire country of Japan to be analyzed is so huge that it is difficult to perform spatial analysis with the basic database (PostGIS), so we tried to add various attributes that can be analyzed spatially in QGIS by converting the data to FlatGeobuf format, which has been attracting attention recently. We also tried to add various attributes that can be analyzed spatially in QGIS. The added attributes include the administrative name of the local government in Japan, and the type, version, last editor, and date of data update of the road in the nearest vicinity of the taking photo point (maximum search radius set to 50m) from the OSM dump file obtained separately.
Some of the results of the analysis are as follows. The number of unique contributors who participated in the maintenance of Mapillary data for five years across Japan was about 1500, and it was found that the top 20 users generated about 90% of the data. The top three contributors each shared more than 5 million images. The number of contributors involved in the OSM road data as a comparison of user participation is about 4,800, suggesting that Mapillary data is generated by about 1/3 of the users compared to OSM.
We extracted the major contributors for each of the 1,700 municipalities in Japan and found that about 50 users were involved. Although the Mapillary data in Japan is supported by a smaller number of contributors than the OSM data, we succeeded in bringing to light the image of contributors in each region by analyzing the data on a micro-regional basis.
In terms of the number of Mapillary images taken and their spatial characteristics, the number of images taken on major roads (equivalent to OSM's highway = trunk or primary) in non-urban areas in the Tohoku region (especially Fukushima Prefecture: approximately 6 million images, Iwate Prefecture: about 4 million images) and Kansai region (Kyoto Prefecture: about 3 million images) is outstanding, while the number of images taken on sidewalks (highway = sidewalk) in the metropolitan areas of Tokyo and Osaka is low. In the metropolitan areas of Tokyo and Osaka, the data developed to supplement the OSM data for sidewalks (highway=path, footway, unclassified) and other road types that exist in reality but are not well-developed in the OSM data. In the paper, we plan to describe the local activities in Fukushima Prefecture and Kyoto City, where Mapillary activities are particularly active, in addition to comparisons at the national and municipal levels. In this paper, we also focus on the temporal transition of data maintenance. In this paper, we also focus on the temporal transition of data maintenance. Specifically, we analyzed the relationship between OSM data and the points where Mapillary images were taken using time series clustering.
This study is a multifaceted spatial analysis of long-term photography logs through Mapillary and the first study to reveal macro trends across Japan as well as more local trends in combination with attributes of road data from municipalities and OSM. In addition, by using distributed processing methods such as tiling technology and FlatGeobuf to obtain a large dataset of more than 41 million POIs (Points of Interests) from APIs and analyze the data spatially in QGIS, we were able to process the data without requiring a large-scale server. This is also a significant achievement. Finally, since the Mapillary log data used for the analysis is large-scale, we are planning to provide both archived data and spatially aggregated GIS data.
Spatially aggregated from Mapillary POI data (41 765 634) for all of Japan used in the analysis, we are providing both FlatGeobuf format data per municipal-level (232.2 MB; 32 attribute values) and per 1-km grid-level (126.5 MB; 20 attribute values), via a Github repository:
Soil erosion is a major global land degradation threat. Improving knowledge of the probable future rates of soil erosion, accelerated by human activity and climate change, is one of the most decisive factors when it comes to making decisions about conservation policies and for earth-system modelers seeking to reduce uncertainty on global predictions .
In this context, the use of remote-sensing based methods for soil erosion assessment has been increasing in recent years thanks to the availability of free access satellite data, and it has repeatedly proven to be successful [2, 3]. Accurate information about it is, however, usually known only at the local scale and based on limited field campaigns. Its application to the Arctic presents a number of challenges, due to peculiar soils with short growing periods, winter storms, wind, and frequent cloud and snow cover. However, the benefits of applying these techniques would be especially valuable in arctic areas, where ground local information can be hard to obtain due to hardly accessible roads and lands.
Here we propose a hybrid solution, which uses ground truth samples to calibrate the processed remote images over a specific area, to then automate the analysis for larger, less accessible areas. This solution is being developed for soil erosion studies of Iceland specifically, using Sentinel 2 satellite data combined with local assessment data from Iceland’s Soil Conservation Services department, Landgræðslan. Their historical data is more extensive than usual, since they are the oldest soil erosion department in the world.
Available data includes parameters of bare ground cover, which can be calculated from satellite images alone, after using information from observationally correct areas without vegetation for calibration; Icelandic soil profiles, to be analyzed to find how the profile relates to soil erosion intensity; as well as the parameters of agriculture use and arable land data including plant species in cultivated lands.
For the training phase we employ a dataset composed of 550 cropped georeferenced and atmospherically corrected Sentinel 2A images , combined with a Digital Elevation Model (DEM) of Iceland that allows us to detect slopes which can produce landslides or help erosion to occur. The dataset is labelled by six degrees of erosion severity, using measurement points furnished by Landgræðslan. We split it into 2/3 for model training and 1/3 for model testing.
These images are in tiles of 10980x10980 pixels (about 600 MB) and cover an area of approximately 100x100 km2. We can crop the images down to preferred size. They contain multispectral data, divided up into 12 bands of varying wavelengths, and a resolution from 10 to 20m. We could add as well some of the 60m bands if necessary. Different band data are combined to create indices which represent or highlight certain features, such as vegetation, soil crusting, bare soil, and red edge indices.
Elevation data from the Arctic (north of 60°N, including Iceland) started to be openly available since 2015 through the ArcticDEM project. The DEMs are derived from satellite sub-meter stereo imagery, particularly from WorldView 1-3 and GeoEye-1. This information can be used to detect to what extent plant growth is reduced at higher heights because of longer snow cover, shorter growing period and stronger winds on one side. By using the variation of DEM and building a slope map, we can see that soil erodes more on steep slopes which leads to a higher likelihood of erosion the steeper they are.
The tools for geometric and topographic correction include SNAP (Sentinel application platform), Sen2Core, FLAASH (Fast line-of-sight atmospheric analysis of hypercubes), DOS (Dark Object Subtraction) and ATCOR software. This correction reduces effects due to shadows and surface irregularities and corrects the single-date Sentinel-2 Level-1C Top Of Atmosphere (TOA) products from atmospheric effects in order to deliver a Level-2A Bottom-Of-Atmosphere (BOA) reflectance product.
After a preprocessing technique based on dimensionality reduction in order to avoid adding too much noise to the algorithm, this labelled data is then used to train a Support Vector Machine (SVM) model for classifying each coordinate. We choose the SVM algorithm as a starting point because it is a fast and reliable algorithm that performs well for classification problems with high-dimensional feature spaces such as ours, and does not require large training sets to achieve high accuracy as other algorithms do (e.g. deep neural networks). The output of the model is a set of coordinates, each with a numeric classification representing soil erosion severity, and used for creating a map of soil erosion severity in a selected area.
This methodology has been proven to provide good results, achieving an overall land cover classification accuracy of 94% , a performance that can be attributed to the spectral complexity of Sentinel-2 data, particularly the red-edge bands which give room for separability of erosion classes. Low separability is a common limitation to the applicability of classification methods. We address this by using ISODATA and minimum distance methods. Two factors that could affect the accuracy of the delineation of eroded soils using spectral images are the intensity of the soil erosion processes and changes in the spectral characteristics of disturbed soils.
The research described here aims at producing a reliable, widely applicable and cost-effective method to classify Icelandic soils into different categories of erosion risk, a proof of concept which, once engineered, could be straightforwardly expanded and applied to other Arctic areas, such as Greenland and Canada.
Motivation & Contribution
Part of the development of an analysis pipeline for mobility studies using GPS data is benchmarking its performance on both the raw data accuracy and the analysis pipeline itself. When we started to develop our algorithm for stop and trip classification, it became clear that we needed a precisely annotated dataset containing accurate stop and trip labels as a ground truth. Apart from validating our development, we wanted to have a reference point for comparing our analysis methods with existing libraries.
For the study, we planned to equip participants with a smartphone to collect movement data in form of GPS and acceleration data for several days in a row. To prolong battery time, we chose a lower sample frequency. Our special focus was to create ground truth for stop and trip detection algorithms, hence the annotation focused on this.
Through this manuscript, we contribute a comprehensive dataset providing accurate start and end timestamps for stops over 126 days. The STAGA dataset is an unprocessed table of GPS coordinates, annotated with a timestamp, altitude, GPS accuracy, and class label ("stop" or "trip"). Each sample labeled as a "stop" further contains the GPS coordinates of the location it's attributed to. The acceleration data is provided as a separate file, but covers the same time frame and contains a triple (x, y, z) of acceleration sensor readings for each given timestamp, sampled at 1 Hz. The STAGA~dataset is provided publicly and free to use. We further provide the iOS app used to create the diary data for simple stop/trip annotation while on the go. All this is made available under CC BY 4.0.
To create the dataset, we first tried a traditional diary approach: four researchers were taking notes, writing down addresses and times whenever they stopped. While this provided some first samples, it was a tedious and error-prone process, since taking notes is impractical in everyday life. Furthermore, it required looking up the coordinates belonging to each noted address, which works for clearly defined, urban spaces but can become problematic otherwise, e.g. in a park or a rural, outdoor environment as addresses aren't precise enough here. Because of that, we developed a simple iOS app that helped us annotate our movements. The app contains a map to validate the identified position, one button to start or end a stop, and a list overview of previously recorded stops. It captures the GPS position whenever a new stop is started and stores the current time as the start timestamp. When the button is pressed again, the stop is completed and the current time is stored as the end timestamp. Trips are derived from the intervals between two stops. Even more, the app allows exporting the captured annotations as a CSV file which can be directly used for benchmarking purposes. This way, we were able to create a GPS dataset containing precise stop/trip annotations, together with a reference position of the actual stop location. The diary was recorded using an Apple iPhone XR.
The device we used for the recordings was a ZTE Blade A5 (2019). It was configured to record GPS samples at a minimum accuracy of 25m, so if the device was unable to obtain a position reading within this radius, the data point was omitted. We sampled data with a frequency of 0.1 Hz and used both network and GPS as sources for determining the position (the smartphone supports A-GPS and GLONASS). It runs Android 9 and is equipped with a 2.600 mAh battery; during the recording of the dataset, the battery was always charged before the phone shut down.
While the dataset contains mostly everyday life, it also holds small periods of vacation, travel, and hiking. Most trips were carried out by bike. However, the dataset contains long periods of walking, car traffic, and train rides as well. While the data was recorded in two different European countries (mostly urban environments), everything was rotated and projected into the North Atlantic for privacy protection. In the same vein, all timestamps have been shifted to start on January first in the year 2000. However, none of these changes should affect the performance of stop and trip detection algorithms, as the relative temporal and spatial accumulation of GPS records are not changed.
The dataset contains 122,808~GPS and 7,813,740~accelerometer records. The recording time spans over 126.65~days.
The diary contains 692~stops and 691~trips.
The average (mean) duration of a stop is $240.8min$; the average trip duration is $22.7min$.
On average, a stop contains $114.0$ GPS samples; a trip contains $63.5$ GPS samples (mean).
Discussion & Use-Cases
This dataset enables researchers to validate the performance of their algorithms that are used to predict stops and trips from GPS data. It provides a ground truth through careful annotations over a long period. In particular, the development of algorithms for stop and trip classification should profit from this dataset as it enables accuracy tests in the temporal and spatial domain. Due to free access, researchers can use it in various projects, enabling them to make data-driven decisions in the development of mobility research frameworks.
The described dataset, containing GPS & acceleration records and stop/trip annotations, are publicly available at the Open Science Framework under a CC-By Attribution 4.0 International license: https://osf.io/34sft/
The annotation companion app we used to annotate the dataset is free software under a BSD 3-Clause license: https://github.com/RGreinacher/GPS-Diary
Digitalization is being adopted in many public services to increase the efficiencies of the required operations. Regarding this, there is an important interest in digitalizing the current building permit procedures since most of the buildings are designed digitally and as three-dimensional (3D). In addition, several countries are making an effort to realize the transition from two-dimensional (2D) cadastre to 3D cadastre. This is because 2D delineation of the legal rights may remain incapable to reflect the reality with respect to property ownership in multipartite buildings. The 3D city models should also be kept updated to effectively manage the occasions (e.g., natural disasters) and services (e.g., waterworks) in the living areas. In this sense, the open data standards have a vital role to enable interoperability between different domains such as AEC and Land Administration. In this sense, this paper first aims to show the current situation and opportunities on how to efficaciously benefit from open data standards for three significant issues. The issues can be listed as, 1) digitalizing the building permit procedures, 2) registering the condominium as 3D, and 3) updating the 3D city models. It then presents an approach for integrating open standards for 3D registration of condominium rights in Turkey context. The integration of GIS and BIM, GeoBIM, has gained importance in terms of digital building permitting since there are rules to be checked with respect to the built environment; for example, the availability of bicycle parks. Besides, zoning plans that are essential for building permitting are generally formatted with GIS-based data. There are studies in the literature that aim to carry out the building permitting by benefiting from the integrated GIS and BIM approach. This approach is also connected with the update of the 3D city model database because the as-built models of the buildings can be integrated into this database after the necessary conversions (Guler & Yomralioglu, 2021). 3D registration of condominium rights, which is part of the 3D cadastre, is often researched in the literature since 2D-based delineation of ownership rights might be insufficient in detecting who owns or responsible for which parts of the multipartite buildings. The availability of 3D representation of ownership rights will be efficient for various land administration applications, for example, property valuation. Open standards are, of course, pivotal for realizing the 3D registration of condominium rights as they not only provide the integration between different organizations but also enable the interoperability for other processes that are needed the same data. In this connection, Land Administration Domain Model (LADM) is the first standard that comes to mind because it aims to provide a common language for land administration systems and supports 3D representation through boundary face and boundary face strings. Since standards like CityGML focus on 3D modeling of buildings more deeply, there are attempts that integrate the CityGML and LADM by exploiting advantageous features of each of the standards for better depiction of ownership rights as 3D (Li et al., 2016). The “Building” and “Cadastre” themes that are produced within the context of Turkey National GIS (TNGIS) describe the relationship between related features, namely parcel, building, building blocks, and condominiums. These features are modeled such that they allow for integration with other standards such as CityGML and IndoorGML so as to enable the efficient reuse of spatial data in different applications. To enable better interoperability and 3D depiction of condominium rights, an integrated model is developed. The proposed features that are adapted from LADM permits the 3D representation of ownership rights. The proposed features are linked with the IFC entities, namely “IfcZone”, “IfcRelAssignsToGroup”, and “IfcSpace” to provide integration with IFC. “BuildingUnit” is linked with the “BuildingCondominium3D” feature, and hence the integration with CityGML data is provided. It can be mentioned that “BuildingCondominium3D” corresponds to the “LegalSpaceBuildingUnit” feature of LADM. The proposed model incorporates the integration with IndoorGML by means of the link between “BuildingCondominium3D” and “CellSpace”. Due to fact of the inevitable proliferation of digitalization, the processes related to land and city management need to be accomplished more digitally and fast. There is an important potential to be practiced building permit issuing, as one of the important public services, in the sense of improvement and automation of the process (Noardo et al., 2022). Open data standards have a quite crucial role in realizing this potential. This is because these standards enable the standardization of information flow between designers and organizations that are responsible for compliance checking. In other words, applicants can prepare their submissions according to required information for building permit issuing. There is a strong interrelation between digital building permitting and the update of 3D city models because if the as-built IFC data of buildings are available, these data can be converted to CityGML, and thus the 3D city model database can be kept up-to-date. In addition to this, an up-to-date 3D city model database can be used for digital building permitting as there is a need for built environment data for integrated and comprehensive compliance checking. For example, rules with respect to infrastructure facilities can be checked using 3D city models. Open standards are effective to be successful in practicing the 3D cadastre. With the increasing trend in BIM, there are proposed approaches that use the IFC schema for 3D delineation of apartment rights in multipartite buildings. In parallel, this paper concentrates on a model that provides the integration with IFC data in the 3D representation of condominium rights in Turkey. The misinterpretations regarding who is responsible or owns of which parts of the buildings can be prevented using the IFC-based depiction of ownership rights (Shin et al., 2020). The semantic information pertaining to independent sections can be queried using the produced IFC-based models. These models will also be quite helpful for property valuation applications in Turkey that exploit 3D variables such as size, volume, position, and material quantities since they provide detailed information on indoor parts of the buildings (El Yamani et al., 2021).
Decentralized applications are a fundamental element for internet development, not only because they are safer but also because they make data accessible to more people than centralized applications. One of the most important architectures of decentralized applications is blockchain, a computing infrastructure capable of sharing data obeying consensus and in an immutable way. The most popular blockchain applications belong to the financial sector, and developments are still missing in other areas that can take advantage of
this technology. An area that can benefit from blockchain characteristics is citizen science, which, as its name specifies, is the research activity performed by a community of citizens. Due to the requirements to this extent, this work studies the feasibility to use a blockchain architecture in citizen science, specifically for ecosystem monitoring. Additional to this, this work helped to understand the advantages and disadvantages of using this technology in this area.
Current state-of-the-art applications that propose partially a solution to citizen science are FOAM and CryptoSpatial Coordinates. FOAM  is a geospatial web application that builds a consensus-driven globe map using the blockchain Ethereum protocol. To achieve network verification, it employs a cryptographic software utility token, where cartographers verify if points added to the network are false or correct. This removes the need for a central authority to regulate and verify the points. The voting mechanism uses FOAM tokens to avoid spamming from the participants. The system works by mapping a blockchain address to a physical location, which can be registered with a spatial resolution of 1m by 1m. CryptoSpatial Coordinates (CSC)  is an Ethereum smart-contract library that can be used for developing geospatially enabled decentralized apps. It uses Blockchain technology to store, retrieve, and process vector geographic data.
In our approach, we were only “inspired” by the previous solutions, but we decided to develop something new and original. The system is developed in Solidity programming language. This allows usage on every blockchain that supports the Ethereum Virtual Machine and guarantees extended flexibility. Moreover, this choice is justified by the expanded ecosystem that Ethereum offers. The architecture of Smart Contracts is completely open-source and developed with a focus on the reusability of the components for other applications in the same field. The two main parts of the architecture are the Cell Smart Contracts and the Registry Smart Contracts. This is based on the mapping of a Discrete Global Grid System (DGGS)  with Smart Contracts. As a DGGS we choose S2 , which is an open-source library developed by Google that offers good processing functionalities and a grid with a fine-grained resolution. Each Smart Contract representing a Cell is used to keep track of the hash of the observations collected in the application. The hashes are used to locate and retrieve the stored files in the decentralized storage InterPlanetary File System (IPFS). This structure also allows to store metadata about the observations, for example, their quality decided through a peer voting mechanism or with some other system. The Registry Contracts are linked to a resolution of the DGGS and have the duty to keep track of the mapping between the DGGS cells of that resolution and their respective Smart Contract.
The prototype platform is developed in Velas, a blockchain architecture with a strong focus on fast transaction speed and low costs of fees compared to other blockchains (e.g. Ethereum, Cardano, Solana). The use-case for this work was the Informative System for the Integrated Monitoring of Insubric Lakes and their Ecosystems (SIMILE) project. SIMILE is a cross-border Italian-Swiss project with the aim to improve the collaboration between public administrations and stakeholders for the management of the Insubric lakes (Lugano, Como and Maggiore) and their ecosystems, as well as monitoring water resources quality . One of the main sources of data in SIMILE is collected with a Citizen Science approach, meaning that the data is collected from normal citizens through their smartphones. The observations of this type include data about water quality, climatic parameters, and multimedia files such as images can be included. The collected data can be currently validated by the public authorities managing the platform but this requires time which is not always available to technicians. In our system, the observations are instead validated through a mixed rating system that allows both users and admins to evaluate each entry. Furthermore, the use of the proposed blockchain architecture allows access to the collected data without relying on the currently existing Web Application.
The practical importance of this work is to fill the gaps currently present in citizen science applications, by proposing an innovative system that works with the blockchain infrastructure. The result of this work and the technological development performed, demonstrate that citizen science applications can be, as a matter of fact, developed as a decentralized infrastructure. The main advantages with respect to other systems are data immutability, security and no single point of failure. Future work can include the implementation of a system to further incentivize the collection of data. This will work with a reward system in the form of a Utility Token. This token could be accepted by the public administrations benefitting from the data, in exchange for some form of compensation such as discounts on public services.
Developing a privacy-aware map-based cross-platform social media dashboard for municipal decision-making
Users of location-based social media networks (LBSN), such as Instagram,
Flickr, or Twitter, have produced an unprecedented base of data over the
past decade. According to ILIEVA & MCPHEARSON (2018: 553), "the
enormous scale and timely observation are unique advantages of [social
media data]" and therefore hold enormous potential for various
application purposes such as urban planning, among others.
Most notably for Instagram, as one of the largest LBSN, encouraging the
sharing of locations when creating content, offers completely new and
promising application purposes, through the combination of the spatial
component with timestamps and the actual content (image & text).
Public social media (SM) data have shown their potential examining the
increasingly relevant social problems of Spatial (In-) Justice, spatial
(in-) equality and spatial (in-) equity (Cf. SOJA 2013: 47). However,
few research attempts were made to make these results available broader
in practice and accessible to laypersons in an understandable way.
LBSN data could contribute significantly to creating a better
information base for municipal decision-making processes, reaching
especially younger target groups. Until now, specifically these groups
were difficult to reach in common participation processes (Cf. SELLE
2004), while bearing consequences of municipal policies for the longest
period of time.
Our stated research goal is therefore to provide citizens, laypersons
and municipal decision-makers with an unprecedented LBSN Dashboard, as a
simple open-source platform for spatial multi-purpose LBSN analysis.
Such an undertaking raises certain ethical and legal questions, since
the user data belong to the users themselves, including the right to
self-determination over their data, on the one hand, and the right to
privacy on the other. The far too short-sighted (but frequently used)
argument that posts have been deliberately published, with all the
consequences of their public nature in mind (e.g., BURTON et al. 2012:
2), is simply not sufficient for an in-depth discussion of privacy. This
further violates the most important aspects of privacy (Cf. BOYD &
CRAWFORD 2012: 672). In fact, most users are not or only partially aware
of what can actually be inferred from what they share or disclose about
themselves (KESSLER & MCKENZIE 2018: 6f).
Yet, privacy is rarely addressed in LBSN research and, worse, often
negligently ignored. In this context, many negative examples can be
found where data was analyzed and high-resolution results were
published, clearly violating users' privacy, for example, in scientific
publications (Cf. KOUNADI & LEITNER 2014: 140).
Given the increasing socio-spatial inequality, the rapid growth of SM,
and the growing interest of municipalities in SM knowledge, we see a
significant need for such a privacy-aware LBSN dashboard, which is
entirely new to the geospatial community.
We develop a privacy-aware LBSN dashboard prototype and propose a data
processing pipeline based on the HyperLogLog (HLL) algorithm by FLAJOLET
et al. (2007). The dashboard is geared towards easy information
retrieval and making use of the data richness of LBSN -- without
compromising user privacy and the need for extensive data retention.
Instead, we provide a unique, customizable, GDPR-compliant privacy
approach. The combination of different open-source tools for structuring
multi-platform LBSN data, leveraging the capabilities of HyperLogLog and
simple Python integration ensure easy reproducibility and active
community development (Cf. DUNKEL et al. 2021; DUNKEL & LÖCHNER 2021a &
The dashboard prototype is tailored for use in municipalities and its
citizens, but offers high scalability for other purposes or other
spatial levels. A limited interactive demo and its GitHub repository are
permanently publicly available as a result of a Master's thesis and an
IoT Design Thinking Workshop (Cf. WECKMÜLLER 2021; BUNDESSTADT BONN
We plan on finishing and automatizing the data processing pipeline,
enabling more sophisticated queries and adding further visualization
methods. In the long run, the dashboard is thought to serve as a
participation and open data hub for all citizens and for any city in the
world. So far, the city of Bonn and Chemnitz (Germany) are pilot
partners of this research project.
BOYD, D., & CRAWFORD, K. (2012). CRITICAL QUESTIONS FOR BIG DATA.
Information, Communication & Society, 15(5), 662--679.
BURTON, S. H., TANNER, K. W., GIRAUD-CARRIER, C.G., WEST,J. H., &
BARNES, M. D. (2012). "Right Time, Right Place" Health Communication
on Twitter: Value and Accuracy of Location Information. Journal of
medical Internet research, 14(6), e156.
FISCHER, F. (2008). Location Based Social Media -- Considering the
Impact of Sharing Geographic Information on Individual Spatial
Experience. In A. Car, G. Griesebner, & J. Strobl (Eds.) Geospatial
Crossroads @ GI_Forum '08. Proceedings of the Geoinformatics Forum
Salzburg (pp. 1-7). Wichmann.
FLAJOLET, P., FUSY, É., GANDOUET, O., & MEUNIER, F. (2007). Hyperloglog:
the analysis of a nearoptimal cardinality estimation algorithm. Analysis
of Algorithms 2007 (AofA07), 127--146.
ILIEVA, R. T., & MCPHEARSON, T. (2018). Social-media data for urban
sustainability. Nature Sustainability, 1(10), 553-565.
KESSLER, C., & McKenzie, G. (2018). A geoprivacy manifesto. Transactions
in GIS, 22(1), 3-19.
KOUNADI, O., & LEITNER, M. (2014). Why does geoprivacy matter? The
scientific publication of confidential data presented on maps. Journal
of Empirical Research on Human Research Ethics, 9(4), 34-45.
Selle, K. (2004). Kommunikation in der Kritik? In: Müller B., Löb S.,
Zimmermann K. (Ed.) Steuerung und Planung im Wandel, VS Verlag für
SOJA, E. W. (2013). Seeking spatial justice (Vol. 16). University of
List of Web References
All links last accessed on February 20, 2022.
BUNDESSTADT BONN (2022). Studierende entwickeln neue Ideen für die
digitale Stadt von morgen.
DUNKEL, A., LÖCHNER, M., KRUMPE, F. & Contributors (2021). LBSN
DUNKEL, A. & LÖCHNER M. (2021a). LBSN HLL Database - Docker Container.
DUNKEL, A. & LÖCHNER M. (2021b). Lbsntransform.
WECKMÜLLER, D. (2021). LBSN-Dashboard Prototype for Bonn.
Complex quarry districts like Apuan Alps’ marble quarries require remotely sensed high resolution data for soil consumption monitoring over the years: extractive activities lead to environmental challenges that require accurate environmental controls issued by the Tuscan Regional Environmental Agency (ARPAT). The Regional Environmental Information System office (SIRA) over the last 5 and 10 years has developed methods and techniques suitable for both 2D and 3D soil consumption monitoring by using free aerial and satellite images and Open Source Geo-spatial Software for data processing and data dissemination useful in controls’ planning and management. Aerial images and LiDAR acquisition, satellite data, RPAS acquisitions have been tested in order to evaluate their suitability in deriving both 2D and 3D indicators with proper resolution to address required spatial-temporal constraints, i.e. yearly monitoring of high resolution changes (spatial resolution between 50cm and 1m).
Due to the size of the Area of Interest (AOI) of the Carrara basins, up to 2.5km x 2.5km, stereo satellite and aerial images can be used to obtain precise terrain models by photogrammetric reconstruction useful in 3D soil consumption monitoring, while middle-resolution (10m) multi-spectral satellite images and high-resolution aerial images (50cm-1m) can be used in 2D soil consumption monitoring and quarries’ area regulations by public bodies (natural soil loss, exhausted areas restorations, debris removals and new disposals).
Open-access Sentinel-2 multi-spectral satellite images with 10m of spatial resolution have been used to assess coverage changes; the results have been subsequently refined by manual interpretation over 5 years (2016-2021). Both semi-automatic methods based on spectral distances and machine learning techniques have been used to identify areas affected by extraction activities in QGIS 3.x environment over Sentinel-2 images. Free OGC Web Map Services (WMS) made available by the Tuscan Regional Information System have been used to assess changes highlighted by semi-automatic methods: aerial high-resolution images between 2010 and 2019 have been evaluated by visual photointerpretation, allowing to extend to 10 years the 2D soil consumption assessment over the whole area.
Comparison of highlighted 2D changes to regulated areas like mapped debris disposals and quarries’ property limits have been used to check proper developments of extraction activities and proper environmental debris management.
In turn, 3D changes have been tracked by comparison of 2009 and 2017 free aerial LiDAR data made available for download by the Tuscan Regional Information System, integrated with two stereo models obtained from 2020 and 2022 Pléiades satellite high resolution images (new acquisitions) freely granted by ESA following Project Proposal id 61779 (“Quarry activity monitoring in Apuan Alps”). Stereo satellite B/W images with 50cm of spatial resolution have been processed by using Open Source stereo processing pipelines in Docker virtual environments, obtaining high precision digital surface models (height precision around 1m) after vegetation filtering. 3D changes detected over the years by elevation algebraic comparison, performed in QGIS 3.x environment, highlight quarries characterized by intense extraction activities (extracted marble blocks, characterized by positive quotas differences) and quarry area management (debris disposing and service infrastructure building, characterized by negative quotas differences).
The combined usage of both 2D and 3D changes’ indicators can be challenging in term of proper representation of soil consumption dynamics over the years: while decision makers need a quick and easy access to both 2D and 3D data, web technologies suitable for a proper representation have been developed in very different contexts, making their integration quite complex. While a ‘classical’ 2D webgis client Openlayers or Leaflet-based can be enough to highlight 2D changes and – with some limitations – 3D changes as elevation differences, a ‘true’ 3D visualization environment must be set to track ongoing extraction activities aiming to assess both (a) compliance to authorized extraction plans by public bodies and (b) proper debris management in quarry areas. In addition, 3D web viewers are mainly targeted to represents point clouds or CAD drawings, making very difficult the integration of 2D, 2.5D (Terrain Models) and 3D (extracted volumes) data.
A dual 2D/3D webgis client have been developed for proper representation of 2D/3D spatial indicators of ongoing extraction activities in the Carrara marble basin: high resolution images have been served as tiled data, while 2D/3D spatial indicators are served as static and/or tiled vector data. Open-Source libraries have used in data processing, serving and representation inside a map interface.
For each quarry included in the Carrara basing, both area limitations and authorized areas for extraction activities have been superimposed over the spatial indicator layers, thus allowing users to easily locate areas subjected to intense extraction activities and to evaluate compliance to sustainability plans and environmental management prescriptions issued by public bodies.
2D and 3D indicators are in progress to be used in prioritizing environmental controls’ planning: this novel application would require a proper scoring system based on the degree of compliance to both environmental management prescriptions and performances mainly in the field of quarry and marble slurry waste management.
NoiseCapture is an Android application developed by the Gustave Eiffel University
and the CNRS as part of a participatory approach to environmental noise mapping.
The application is open-source and all its data are free.
The study presented here is a first analysis of the first three years of data
collection, through the prism of noise sources. The analysis only focused on the
labels filled in by the users and not on the sound spectrum of the measurement,
which will be studied later.
The aim was to determine whether known dynamics in environmental acoustics could
be recovered using collaborative data.
This preparatory work having to be consolidated and extended thereafter, and with
the will to include this study within the framework of the Open Science, an
attention was brought on the reproducibility aspect of the analysis.
This one was entirely realized with free software and literate programming techniques.
The context of the study, the tools and techniques used and the first results
obtained will be presented as well as the benefits of using literate programming
in this type of preparatory work.
An article presenting this dataset was published in 2021 (Picaut et al. 2021).
It details the structure of the database and the data, the profile of the
contributors and the contributions but does not analyze the content of the data.
This is what this article proposes to begin.
The data used in this study correspond to contributions made between August 29, 2017
and August 28, 2020. During this period, nearly 70,000 unique contributors allowed
the collection of more than 260,000 tracks for a total of about 60 million seconds
of measurement. A trace is a collected recording, it contains the sound spectrum
(1 second, third octave) recorded by the phone coupled with its GPS positioning
(1 second). This information can be enriched by the contributor with labels.
There are 18 labels and the user can select one or more of them for each of the
traces made. They are detailed in (Picaut et al. 2021).
The preliminary work presented here focuses on the analysis of the proportion of
certain labels in the global sample at certain temporalities.
In addition to data from the collaborative collection, some additional data were
used to limit the study area. We chose to limit the geographical scope of this
preliminary study to metropolitan France because this area contains the largest
number of recordings.
The climate and sound dynamics are known and documented there.
To facilitate the reproducibility of spatial filtering, it was decided to use
open data sets from recognized sources: the Natural Earth database
(Patterson and Kelso 2021) and the Admin Express database from the
National Institute of Geographic and Forest Information (Institut Géographique National 2021).
The data are provided as a dump from a PostGreSQL/PostGIS database (Ramsey and Blasby 2001).
Several scripts perform much of the attribute and spatial filtering.
These filterings are saved in a materialized view whose data will be analyzed
with the R language.
The R language (R Core Team 2021)
is a programming language for data processing and statistics with many libraries
dedicated to geospatial data.
Rmarkdown allows to mix code and text in markdown for the dynamic production of
graphs, tables and documents.
It is one of the recommended means for literate programming.
Git is a Distributed Version Control System (DVCS) (Chacon and Straub 2014).
It enables collaborative and decentralized work.
The choice of Git was natural as different collaborators are present on several
sites (Nantes, Lyon, Paris) and Git is already used within the UMRAE laboratory.
The data are provided in the form of a PostGreSQL/PostGIS dump.
A server has been set up and the data loaded.
A materialized view was created in order to provide a stable access to the data
corresponding to the defined criteria.
These criteria are both attributive (filtering of certain tags, minimum and maximum
durations, etc.) and spatial (located in France, reduced trace area, etc.).
A Rmarkdown document establishes the connection with the view and then performs
the operations allowing to analyze the data.
A document mixing narrative, figures and code allowed the resumption and
continuation of the analyses shown here.
The study concerns tracks bearing a tag, registered in metropolitan France.
It focuses on the proportion of a certain tag in relation to all the tags for a
given period (time of day, season, etc.).
In the sample studied, it is possible to note a prevalence of the tags roads,
chatting, animals and wind. The tags air_traffic and works are also well represented.
A first axis of analysis concerns the time distribution of the tags.
Animal noises (tag animals) are more frequent in the morning and especially
one hour before sunrise.
This is a common dynamic for bird song.
We also observed peaks in human activity, especially commuting.
The next temporal axis was the seasonality, especially those of animal noises,
with a more intense activity in European spring and summer.
This phenomenon could also be observed in the recordings.
We also noticed that music was less present in autumn than in other seasons and
that it is mostly present at late hours.
The first results are encouraging because road dynamics related to commuting or
animal activity can be observed.
The main question was to determine if these known dynamics in environmental acoustics
can be observed in a crowdsourced dataset.
The first elements seem to answer positively to this question.
Some questions still need to be explored, notably those concerning the
representativeness of samples that are sometimes weak for certain time periods.
The systematic use of open source software, the provision of documented code files
and a document mixing narrative, figures and code have allowed the resumption and
continuation of the analyses shown here.
This work in progress will complete the final article.
Mt. Ushba is situated in the Greater Caucasus in Georgia, next to the Russian border. With its nearly symmetrical double peak appearance, it is iconic and a symbol of the historic Svaneti region in Georgia, famous for its mountains, botany, and century-old defense towers. Svaneti is becoming an increasingly popular tourist destination in summer and winter. Therefore, the German Alpine Club is interested in providing a new map for this region, which will be produced by the Institute of Cartography of the TU Dresden. In the age of open data, it is consequential that OpenStreetMap will be an essential source of the new map. It should make the project more sustainable and inspire people to use free and open-source software for map production.
One basis of each topographic or touristic map is fieldwork, which means organized mapping and editing with OpenStreetMap aiming to verify and to complement map content and coverage, carried out by the Institute of Cartography in Mestia (Georgia) in the summer of 2021. Preparing for this work, a comparison with older maps was conducted to identify possible shortcomings and errors in the data. A draft was created using OpenStreetMap and the SRTM elevation model, preparing for the fieldwork. It helped to evaluate the current state of the data, gave a first impression of the mapping area, and was an ostensive basis for data capturing in field. A field book was produced for each participant, containing the map draft as an atlas and information on which data should be collected and which the specific attributes were required. Finally, the data was contributed to OpenStreetMap, and from there, the draft was updated again.
In the case of land cover, creating an own classification seemed beneficial in distinguishing between typical vegetation classes in a high mountain area. Showing the vegetation in detail is a feature of Alpine Club map, but using OpenStreetMap data would not detailed enough. In addition, a land cover classification based on remote sensing data is more reliable and ensures better consistent results compared to individual contributions from users with different previous knowledge. Open remote sensing data from the Landsat and Sentinel programs offer good sources for such a task and are also used to monitor the glaciers in this area[II]. R is used as an analysis platform. It is possible to classify rock, glaciers, and specific vegetation types such as alpine rose or open birch stands. For identifying the vegetation, representative examples were collected during the fieldwork by entering them in the atlas and taking sample photographs.
Another essential part of a topographic map for a high mountain area map is a good terrain visualization. The SRTM model is beneficial but not detailed enough to create rock depictions, which will be automatically derived by the Piotr tool[iii]. Planet Labs Inc provided high-resolution Rapid Eye and their Dove satellites imagery, suitable for creating a digital elevation model with a spatial resolution of approximately ten meters by applying stereo photogrammetry methods using the AMES Stereo Pipeline[iv]. The result enables a much more precise and understandable representation of the terrain. The terrain points were recorded with special standard GPS devices, the Garmin GPSMAP 66sr, which stores the raw observations for two frequencies. Accuracies in the range of around 0.1 meters[v] can be achieved using professional GNSS software.
In order to produce the final topographic map, it is necessary to combine all data components to represent the area around Mt. Ushba. In a first step, the updated OpenStreetMap data is imported into a PostgreSQL database with PostGIS extension. In a second step, an automated generalization is carried out for the selected target scale of 1:33,000, particularly schema transformation, aggregation, and simplification. For the visualization, QGIS is utilized: one project containing all layers with their visualizations served as WMS. It enables team members to view the current map and access all the data without storing it individually locally on their computer. Additional web mapping services were set up to provide georeferenced scans of other available maps of the region to enable a comparison and evaluation of the new derived topographic map product.
Because of the wide range of tasks, the work is split into several work packages and ongoing subprojects. Students' master theses within the International Cartography Master program – a cooperate offer of TU Dresden, TU München, TU Wien, and University Twente contributed significantly to the project by implementing and evaluating selected methods required for the map derivation.
[i] Grinberger, A. Yair, Moritz Schott, Martin Raifer, and Alexander Zipf. “An Analysis of the Spatial and Temporal Distribution of Large‐scale Data Production Events in OpenStreetMap.” Transactions in GIS 25, no. 2 (April 2021): 622–41. https://doi.org/10.1111/tgis.12746.
[ii] Holobâcă, Iulian-Horia, Levan G. Tielidze, Kinga Ivan, Mariam Elizbarashvili, Mircea Alexe, Daniel Germain, Sorin Hadrian Petrescu, Olimpiu Traian Pop, and George Gaprindashvili. “Multi-Sensor Remote Sensing to Map Glacier Debris Cover in the Greater Caucasus, Georgia.” Journal of Glaciology 67, no. 264 (August 2021): 685–96. https://doi.org/10.1017/jog.2021.47.
[iii] Geisthövel, Roman, and Lorenz Hurni. “Automated Swiss-Style Relief Shading and Rock Hachuring.” The Cartographic Journal 55, no. 4 (October 2, 2018): 341–61. https://doi.org/10.1080/00087041.2018.1551955.
[iv] Shean, David E., Oleg Alexandrov, Zachary M. Moratto, Benjamin E. Smith, Ian R. Joughin, Claire Porter, and Paul Morin. “An Automated, Open-Source Pipeline for Mass Production of Digital Elevation Models (DEMs) from Very-High-Resolution Commercial Stereo Satellite Imagery.” ISPRS Journal of Photogrammetry and Remote Sensing 116 (June 2016): 101–17. https://doi.org/10.1016/j.isprsjprs.2016.03.012.
[v] Lachapelle, Gérard, Paul Gratton, Jamie Horrelt, Erica Lemieux, and Ali Broumandan. “Evaluation of a Low Cost Hand Held Unit with GNSS Raw Data Capability and Comparison with an Android Smartphone.” Sensors 18, no. 12 (November 29, 2018): 4185. https://doi.org/10.3390/s18124185.
In northern Italian mountainous regions, forests are invading pastures and abandoned cultivated surfaces leading to an important land-use change phenomenon and reducing those open areas that are fundamental for ecological purposes .
The research here presented, focuses on a multiobjective and contemporary assessment methodology of two or more multicriteria analyses applied in the identification of the most suitable areas for agricultural purposes between those surfaces that have been invaded by forests carried out using Free and Open Source Software for Geospatial (FOSS4G) software. The analysis of the areas was determined by taking into account their intrinsic characteristics and their spatial location in relation to the territory and started from previous studies on land use in the Autonomous Province of Trento (Italy). The pilot areas are three municipalities that are part of Trento’s Province: the municipality of Trento - the Province’s capital, the municipality of Pergine Valsugana and seven municipalities that are part of the Piana Rotaliana region. Almost 88% of the Municipalities are located at an altitude of more than 600 m above sea level reflecting the peculiar topography of the province made up of valleys and high mountains with high percentages of steep slopes . In Trento, the overall density is 742 inhabitants per square kilometers and the pressure on urban and peri-urban areas is nine times higher than the rest of the province . 20% of Trento’s territory is classified as agricultural and 50% as forest or pasture land. About 70% of the territory is covered by silvopastoral -agricultural areas, the remaining 30% is categorized as urban. The repartition of the province’s surface is similar to the one of the city of Trento: 61% of the territory is covered by forests, 33.6% by agricultural areas, and only 5% by other types of land use. Collective bodies and public actors manage most of these silvopastoral -agro-forestal areas whose ownership is collective and is managed following the “uso civico” rights, a customary right embedded within the properties of communities and villages . Therefore, profit is not their main aim.
This study has been part of the SATURN European project  funded by EIT Climate-KIC (November 2018-December 2021). Three city-regions have been involved: the Trentino region in Italy, Birmingham in the United Kingdom, and Gotheborg in Sweden. The project aimed to reintegrate natural resources into cities' climate change adaptation strategies and to expand and nurture its model by creating a broader initiative involving an increasing number of stakeholders. Geospatial data set was georeferenced and managed with GRASS and QGIS and the files were collected combining data freely available at the Autonomous Province of Trento as well as self produced during the project.
The comparative analysis and methodology were carried out by means of QGIS 3.8 Geographic Information System that has been used to complete the analysis in order to develop a methodology that can be widely used by territorial operators and Public Administrations.
Through a series of multi-criteria analyses  of the agricultural and ecological vocation of a given region, and more specifically of abandoned agricultural areas, it was possible to create initial maps assigning values according to specific considered aspects. To lead these analyses, it was necessary to collect and select a significant amount of georeferenced data and then standardise them. Synthesis analyses have been useful to compare the ecological and agricultural aspects and to integrate them in synthesis maps, which can be used in the future for land management and planning.
In order to validate the model and to verify the results, on-site inspections were carried out both in Valsugana and in Val d'Adige.
Technicians and experts have been involved in the research through focus groups, organised within the SATURN project, which allowed some general criticalities of the territory to emerge, and through the completion of a questionnaire proposed within the thesis work. Through these questionnaires, it has been possible to identify the most important criteria for assessing a plot of land from an ecological and agricultural point of view.
The obtained results showed how the classical approach, based on single criteria analysis, differ from the multicriteria approach for its potential to produce a more precise and clearer classification output of the aspects considered, showing the two multi-criteria analyses and their dependence on a single final map. Significant advantages have been taken from the use of this method in terms of data and information exchange between the stakeholders and in terms of a deepen understanding of the characteristics of the areas that have been analysed.
The proposed methodology and the script that has been developed can be used in order to better plan forest management and as a basis for future territorial plans.
Moreover, the multicriteria approach, which initially provides for a separate analysis of the research layers and then integrates them into a single final output, may represent a starting point for ecosystem evaluations. Preserving the ecosystem of an area, or rather the mosaic of ecosystems that make it up, is in fact of fundamental importance, as is succeeding in creating an eco-sustainable environment. In order to achieve this, it is necessary to have a spatial planning process that is as accurate as possible and that evaluates all the ecological criteria in a diversified manner with respect to the criteria of the object of research, so as to be able to identify key elements.
The model presented can be replicated by changing the current research object, i.e. agricultural assessment, and keeping the ecological assessment instead.
Future development will foresee the transformation of the Python script into a plug-in for Qgis, guaranteeing greater functionality for those who wish to use it.
Manual digitization of 3D information from aerial stereo images has been one of the major tasks in national mapping agencies. However, it is labor-intensive. There is an enormous need in developing an automatic method for extracting 3D information from stereo images. Recent advancement in hardware and software provides the possibility of realizing full automation in stereo-image tasks. Stereo-image tasks require a large capability of computational power. The emergence of the GPU gave great support to such technique development. With recent advances in AI, machines are gaining the ability to learn, improve, and execute repetitive tasks precisely, especially with deep learning techniques: the capacity of combining and adjusting millions or even billions of parameters from a neural network. Therefore, it becomes possible to automize many complex tasks.
OpenCV was built in 2008. It is an open-source library that includes several hundreds of computer vision algorithms. OpenCV supports functions of epipolar geometry estimation and constraint as well as depth calculation from stereoimage. Before 2016, many researchers have employed OpenCV (Open Source Computer Vision Library) for depth estimation from stereoimage. In recent years, using deep learning methods for obtaining depth maps from stereoimage has been highlighted. In deep learning applications, left and right images usually need to be rectified before they can be fed to the network. GC-Net, HRS Net, MVSNet, PMS Net, and PLUMENet are examples of convolutional neural networks (CNNs) that can be used for this purpose. GC-Net was introduced in 2017 by Kendall et al. , PMSNet in 2018 by Chang et al. , MVSNet in 2018 by Yao et al. , and HRS Net in 2019 by Yang et al. , PLUMENet in 2021 by Wang et al.. Disparity images are typically used as labels, but some networks work with unsupervised learning, meaning no labels are used for training them. Some experiment was based on open-source datasets, such as KITTI stereo and Middlebury stereo being good examples. Ready remote sensing stereo image datasets still seem to be quite scarce, but at least some can be found, for example, stereo image dataset of Vaihingen: Aerial Stereo Dense Matching Benchmark introduced in 2021 .
Our experiment was focused on obtaining disparity maps i) from aerial stereo images with known orientation parameters using openCV; ii) from rectified aerial stereo images with deep neural networks: GC-Net, MVSNet, and PSM net. The results based on OpenCV and neural networks were compared and evaluated.
Two datasets were used in the experiment. One dataset was the aerial stereo images with known orientation parameters from National Land Survey of Finland. The aerial images were acquired in 2020, using the UltraCam Eagle Mark3 (Vexcel, Austria), with a forward overlap of 80% and a side overlap of 30% between flight stripes. The flight height was 7657.9 m. The image has a spatial resolution of 30 cm. Another set was from the ISPRS Aerial Stereo Dense Matching Benchmark 2021: the Vaihingen dataset. The Vaihingen dataset from the ISPRS 3D reconstruction benchmark provides a good registration of oriented images and LiDAR point clouds. The dataset is composed of 20 images with a depth of 11 bits and a ground sample distance (GSD) of 8 cm. The reference depth maps were produced by Lidar point clouds for evaluation.
In the experiment of using the OpenCV library, known orientation parameters were used for image rectification. ORB (Oriented FAST and rotated BRIEF) features were used to find image matching points. ORB is open source, which is an efficient alternative to SIFT or SURF. The algorithm uses FAST in pyramids to detect stable keypoints, selects the strongest features using FAST or Harris response, finds their orientation using first-order moments and computes the descriptors using BRIEF (where the coordinates of random point pairs (or k-tuples) are rotated according to the measured orientation).
In the experiment of using deep learning methods, GC-Net, MVSNet, and PSM net, were tested. GC-Net is an end-to-end deep stereo regression architecture . It estimates per-pixel disparity from a single rectified image pair by employing a cost volume to reason the geometry and utilizing a deep convolutional network formulation for reasoning the semantics. MVSNet is an end-to-end deep learning architecture for depth map inference from multi-view images . It computes one depth map at each time by extracting deep visual image features, building the 3D cost volume upon the reference camera frustum, and applying 3D convolutions to regularize and regress the initial depth map to generate the final output. PSMNet is a pyramid stereo matching network consisting of two main modules: spatial pyramid pooling and 3D CNN . It exploits global context information in stereo matching. PSMNet extends pixel-level features to region-level features with different scales of receptive fields by pyramid pooling module. The cost volume was formed by combining global and local feature clues. A stacked hourglass 3D CNN was designed to repeatedly process the context information for estimating cost volume in a top-down/bottom-up manner to improve the utilization of global context information.
The results from two datasets with three networks and OpenCV were presented. The experiments exhibited that selecting proper loss function and learning rate is important in using neural networks. It affects the performances and results of different networks. The results were evaluated by comparing with the reference depth maps. The advantages and disadvantages of using networks and OpenCV library were analyzed and discussed.
According to the goals of the European Communications "2030 Digital Compass: the European way for the Digital Decade" and “Open Source Software Strategy 2020 – 2023” regarding the digitalization and the use of the Open-Source solution inside the Public Administrations, this paper presents the approach followed for the realization of an Open-Source Web-GIS to transfer all the information assets related to the public works, that must be judged by the Regional Technical Administrative Committee (C.R.T.A). The developed Web-GIS consists of a platform to support the “Civil Engineering” authority of the Abruzzo Region in the management of the public works during their whole administrative process.
In particular, the main aims of the Web-GIS are:
- to manage in a unique shared geospatial database the public works, that must be judged by the C.R.T.A. of the Abruzzo Region;
- to monitor the activities and the life-cycle of the public works;
- to share information related to the public works both with other regional authority offices and with citizens.
In general, the creation of a WebGIS starts from a project created on the client side which, in a subsequent phase, will be loaded on a server to allow the visualization, interaction and distribution of the information among multiple users at the same time.
In this case, the creation of GIS project for the management of geo-referenced territorial and alphanumeric information for their description required a careful study of the needs of the “Civil Engineering” authority of the Abruzzo Region and a definition of the contents of the GIS platform, passing through the documentation and the archives to consult and implement in the GIS. Finally, the choice of the output to be presented was made, also in relation to the type of end-users that will have to manage (regional authority employees) and view (citizens) the published information.
In order to properly design the requested Web-GIS application, as a first step the structure of the geodatabase has been designed locally into the Qgis software, one of the most famous open-source GIS software. Among the main geodatabase formats, the geopackage, an open, OGC (Open Geospatial Consortium) standards-based, platform-independent, portable, self-describing, compact format for transferring geospatial information, has been selected. The GeoPackage standard describes a set of conventions for storing it within a SQLite database. The geopackage format has been selected considering the geometric entities of the public works that must be stored within the database, together with their attributes, that consist of points, multi-lines, and multi-polygons elements. In fact, the public works that must be judged by the C.R.T.A. of the Abruzzo Region can be buildings (strategic or scholar buildings or healthcare constructions), road works, hydraulic works, or land defense. These public works, as required by the Abruzzo Region, do not have an exact type of geometry but, depending on its type and the type of project can be represented in the most appropriate way to understand the intervention itself. The geopackage format allows storing all the information related to the public work in a single file, simplifying their management.
The use of QGIS solution was made keeping in mind the idea of using LizMap software to publish directly the contents of the geodatabase designed locally in a simple way. Lizmap is an open source software designed by 3Liz, a service company revolving around QGIS software, which facilitates the publishing of web mapping applications from QGIS Desktop using QGIS Server as Map Server. Another important aspect for this choice consists in the fact that QGIS environment is well-known among the public authorities employees and this simplified the interaction during the design phase of the database. This allows verifying if the structure of the designed database satisfies all the requirements of the “Civil Engineering” authority of the Abruzzo Region. In addition, in the future, the “Civil Engineering” authority of the Abruzzo Region will be able to modify or update autonomously the public works that will be subject to the judgment of the Regional Technical Administrative Committee (C.R.T.A), directly in QGIS Desktop.
After the realization of the project in Qgis Desktop, in order to share the map online, the Lizmap plugin inside Qgis Desktop was used to configure the publishing options, i.e. scales, base layers, metadata, etc. . Once the file configuration is compiled, it is possible to synchronize the working folder with the Qgis Server. When synchronisation is complete, the QGIS project can also be accessed on the Internet, through the Lizmap Web Client application using a web browser. Lizmap Web Client is installed on QGIS Server in order to insert projects and it allows to configure the project and the displayed web page. All this step can be performed locally, (intranet network) and finally, the project and the created settings files have to be transferred to the region geoportal of Abruzzo Region. In conclusion, the use of Lizmap to transfer Qgis Desktop projects on the web represents a good solution to move Public Administration towards the use of Open-source solutions and towards the digitalization procedures required by the European Commission. In addition, this tool had the purpose of ensuring maximum transparency to citizens who, although not insiders, can access the geoportal to see how the funds allocated by the Region, the Italian Nation, and the European Community are distributed and spent.
This presentation will discuss the ongoing effort to map, in unprecedented detail, a forested area in Central Bali, Indonesia, the use and ownership of which is currently a contested question. The presentation will outline the historical and political reasons for the contested nature of the land area under investigation, and then discuss participatory field mapping methods and a collaborative analysis pipeline developed to represent via formal GIS methodologies the land and its use with the needs of different and differing stakeholders in mind.
Our research approach is informed by current approaches to community mapping in general [Cochrane2020] and specific to emerging economies, with a particular focus on the conditions in Indonesia [Sulistyawan2018]. In particular, we are studying an area is Central Bali in the vicinity of the Taman Wisata Alam (TWA) Buyan -Tamblingan comprising 1,491 hectare of forest area including Alas Merta Jati [Suryawan2021], part of the Batukaru nature reserve which is estimated to contain sufficient springs to meet Bali’s water needs [Zen, 2019] (Fig. 1). The Alas Merta Jati is contested as it is currently claimed as ancestral lands (or “customary forest”) by the Tamblingan people and at the same time claimed as a state forest by the Indonesian government. While both entities claim to want to protect the forest along fashionable “sustainable” principles [Strauss2015], each entity interprets the responsibilities and benefits of sustainable actions in different ways. Subjecting the area to GIS compliant analysis approaches is one way by which differences and commonalities across stakeholders can become tractable.
Collaboration framework :
Our work is coordinated and overseen by a local NGO, the WISNU foundation (https://www.wisnu.or.id/) with which we have a memorandum of understanding outlining work methods, data collection and data ownership as well as ownership of intellectual property, creating formal boundary conditions for an equitable long-term outcome of the project. Moreover, our research team includes GIS professionals from the Indonesia National Research and Innovation Agency with expertise in remote sensing of tropical forests.
Data sources and field work:
Our data collection relies on a combination of high-resolution satellite imagery from PlanetScope (PS) provided by Planet Labs (integration of Sentinel-2 data is in progress as well) and field level data collection through inhabitants of the area. PS with a resolution of 3.7 m/pixel containing four channels: Blue (455 - 515 nm), Green (500 - 590 nm), Red (590 - 670 nm), and Near-Infrared (780 - 860 nm) [Raza et al. 2020]. Our first step follows standard practices. We study the composite’s PS satellite data in comparison with Google Earth (GE) images to identify a first round of land cover features. However, we then also check questionable areas with local informants who collect short video recordings of the actual situation on the ground (Fig. 2) and upload these verification datasets to a shared server. Moreover, our system is set up to support low-tech input data collected with old-fashioned paper and pencil. A handwritten set of longitude, latitude and identified land cover class is sent (via email) to the evaluation team where custom python scripts convert the information to an entry into a vector data set suitable for classification purposes.
Complex land cover classes:
The single most significant issue we encounter in this project is the fact that local knowledge and local interests are not represented in GIS maps nor in the land cover categories that routinely constitute formal categories in GIS representation. The existing GIS knowledge production pipeline, with its reliance on visual evidence, is not sufficient to address these needs.
For example, how might one monitor and detect the outcome of efforts of the "jaga teleng" (traditional forest guards) as opposed to modern forest regrowth approaches? Even some quotidian and concrete “use” classes in the study area are resistant to visual-only inspection. Coffee plant farms typically grow together with and often under clove tree gardens and cannot be distinguished even with high-resolution (3.0m/pixel) satellite imagery without additional field level data collection. In general, the land use conditions in Bali are characterized by a variety of mixed uses and mixed conditions, with untouched areas mingling with secondary forests and overgrown light use agricultural areas creating a complex assemblage of “quasi-natural” conditions. And the tropical conditions on the island ensure that an agricultural area that has been harvested or abandoned, regrows to a semi-wild area in months. While this project contains many elements, the image interpretation and metadata creation that can be ingested into a GIS framework to represent some of the convention challenging categories listed above, is by far the most challenging aspect of the effort.
A GIS analysis framework for experimentation and collaboration:
In order to support the challenging data interpretation work and enable a collaborative testing environment, we have developed a cloud-based GIS environment (COCKTAIL) that combines elements of established QGIS, GDAL, OTB and SAGA environments such that we can create processing pipelines across these various widely used GIS systems and run this software cocktail remotely in the cloud. This allows our research partners to work in their respective time zones and explore different approaches to the data analysis and classification approaches within a shared analysis framework. Importantly, our pipeline records the large collection of local setting and internal evaluation parameters to a file such that each member can easily recreate the output of the other team member experiments. Results are transferred to a shared remote server such that results can easily be visually inspected together during remote meetings.
At the time of this writing, Cocktail is used in our research group to combine satellite imagery with texture maps, to create change maps (from the start of the datasets to this year) and to perform land cover classification (Fig. 3). Cocktail includes Support Vector Machine, Random Forest and Neural Network classifiers, the suitability of which we are now analyzing in an iterative manner, collecting more data as the need arises (see resources).
In the recent years, point cloud technologies, such as Unmanned Aerial Vehicles (UAV), Terrestrial Laser Scanners (TLS), Aerial Laser Scanners (ALS), let alone Mobile Mapping Systems (MMS) have come into the focus of attention and have been a subject of considerable public concern in mapping. Thanks to these new techniques, experts can survey large areas with sufficient and homogenous accuracy, with high resolution.
It comes from this that there are several areas where the point clouds can be used. One of the applications is updating land registry maps. Many countries all over the world face the issue that a significant part of their large-scale land registry maps are based on old analogue maps from the late 19th or the early 20th century. One of these countries is Hungary, where more than eighty percent of digital cadastre maps were digitised using analogous maps in a scale range of 1:1000 – 1:4000, not to mention the maps with fathom as base unit and with the scale of 1:2880. It is quite common to have a few meters offset in the features depicted in the land registry maps, which yields a wide variety of problems in applying maps, such as in public utility registration and engineering practice. The final solution to the problem would be to carry out new surveys for the critical areas, but that has been often postponed due to the lack of time and excessive costs.
Thanks to the new technologies updating the old and not relevant maps are feasible and there are several examples, where point clouds were used to update old land registry maps with manual processing. As it has been investigated by many researchers, an optimal solution is to generate point clouds from the combination of nadir and oblique images taken by UAVs, typically having 1-3 cm Ground Sample Distance (GSD). Our aim is to find the building footprints with not more than 10 cm accuracy from the point clouds. Oblique images play an important role in having sufficient number of points on the walls of the buildings in the point cloud, so we can find not only the outline of the roofs but the walls of the buildings, too.
There is another crucial factor that needs to be considered when processing point clouds, namely that of automation. It is beyond doubt that automation definitely improves the efficiency of the whole procedure. There is already a wide range of open-source software available, such as OpenDroneMap (ODM)/WebODM, CloudCompare, QGIS, not to mention many open-source libraries, like Open3D, PDAL, Point Cloud Library (PCL), SciPy and Scikit-learn to support automatic data processing of point clouds. During our research, the different combination of these libraries was investigated paying attention to be accessible and freely developable for everyone. Therefore, the source code (mostly written in Python) of our programs, created in the frame of this project, is also open-source and available on our Geo4All Lab’s GitHub page.
In addition, our study focuses on segmenting point clouds in an almost fully automated way. The processing starts off by a cloured point cloud which is generated by ODM from nadir and oblique images. Then, a Normalized Digital Surface Model (nDSM) is generated. The Cloth Simulation Filter (CSF) algorithm is used to separate points on the ground and a Digital Elevation Model is generated from those points. From the nDSM the ground and the near ground points are removed, this way the low vegetationare also filtered out.
Subsequently the filtered point cloud is voxelized. Voxels are essential to divide the complex task into small processes which can be parallelized. With the help of a sequential RANdom SAmple Consensus (RANSAC) method in each voxel, one or more significant planes are detected. Those points in a voxel that fit on a found plane are substituted by a single point on that plane and the normal of the plane, thus a spare point could be used later. This way noise and vegetation is filtered out in a robust and efficient way.
In the next step of processing the spare point cloud is segmented by the normal directions into three categories: walls, roofs and others. The wall and roof points are further segmented separately by region enlargement method. Finally, the continuous wall and roof segments are combined to define the footprints of the buildings.
Test areas and traditional land surveying methods were used to validate the aforementioned algorithms. As our intention is to apply the technology mainly for smaller settlements, we are about to focus on detecting detached houses. According to our preliminary results, land registry maps with homogenous accuracy is achievable. Accuracy can be characterized by less than 10 cm, which meets the requirements in general. With the contribution of open-source solutions, the technology offers an economical way of updating old and heterogenous land registry maps.
Context and purpose
OGC standards shape a backbone within the OSGeo community in defining a pathway to software implementation toward the standardization of geospatial information and related services ensuring interoperability between FOSS4G software. Since 2016, the OGC has initiated the specification of a new generation of standards based on the OpenAPI so as to facilitate integration in modern web applications and systems.
Underpinning the OGC API roadmap, the development of all these standards represents a significant amount of activities carried out by various OGC working groups, testbeds and pilots from the OGC Innovation Program. Some standards have been approved, many are still under development and it is therefore not always easy to follow the progress. Indeed, while some geodata infrastructures involving national entities are already deploying this new generation (e.g. Canada MSC GeoMet), some initiatives run a phase of experimentation (e.g. Geonovum Testbed Platform for the Dutch geoportal).
From a practical perspective, how can organizations and institutions anticipate to leverage this new generation of standards to deploy a geospatial data infrastructure? This issue is what this article is about, introducing a project that seeks to address it by running an OGC API testbed platform with a special focus to the Swiss context. This project is embedded in the Resources for the NSDI Program (related to the Swiss Geoinformation Strategy) with the purpose to contribute to the upcoming revision of e-government standards regarding geoinformation (e.g. eCH-0056 Geoservices application profile). The project is about a study jointly carried out by swisstopo and complementary academic partners (HEIG-VD, SUPSI, UNIGE).
As a result of the above mentioned complexity and overlapping of existing standards, the project team has applied a benchmark study approach, where different standards are tested in experimental cases and evaluated in comparison of other existing solutions. The outcomes include both quantitative and qualitative results that will be condensed in practical recommendations for implementation and adoption of the OGC API family.
This research aims at evaluating a selection of different OGC specifications as well as different server and client implementations in order to define e-government recommendations to promote collaboration between authorities, companies and individuals.
The selected mainstream topic for the experimental cases is about climate change. While not yet connected in a complex pilot study, each case represents one of the required components: from sensing (remote/in-situ) to data visualization and exploration, through data offering and elaboration. The study is organized in three parts:
The hydro-meteorological monitoring network of the Canton Ticino, which is currently managed using the SOS standard, has been selected as representative of a practical implementation of basic data required for the climate change impact assessment pipeline. The network, which has a 40 years long time-series, is currently composed of 60 stations and 140 sensors observing precipitation, air temperature and humidity, water temperature, river height. Collected information is operationally used by the local administration to design and actuate water resources protection and allocation to guarantee a sustainable management of the resource and the natural environment while protecting from the impacts of extreme events like floods and droughts. The Sensor Things API operational applicability is evaluated by testing this standard to fulfil all the major in place daily practical operations like for example data quality management, data sharing with third parties, data collection from vendor specific sensors and data analyses and visualization.
Switzerland was the second country in the World after Australia to have an operational satellite Earth Observations (EO) Data Cube. The Swiss Data Cube (SDC) is a tera-scale analytical cloud-computing platform allowing users the access, analysis and visualize up to 38 years (1984-2022) of consistent calibrated and spatially co-registered optical and radar Analysis Ready Data. The SDC leverages the information power of Big Earth Data for monitoring the environment by minimizing time and knowledge required for analysing large volumes of raster data. The derived analytical products provide an effective means to build socially robust, replicable, and reusable knowledge, to generate ready-to-use products supporting evidence-based decisions. Currently, all the data products and their related description (i.e. metadata) are accessible through “traditional” OGC services such as WMS, WCS, CSW. For example, the Normalized Difference Water Index (NDWI) time-series can be used to estimate and monitor the evolution of vegetation water content over the entire country. The aim of the experimental case will be to use a set of new OGC APIs implemented on top of the Swiss Data Cube to track the evolution of NDWI. To reach this objective we will implement the OGC API Coverages, Processes, EDR, Records, STAC APIs to access NDWI raster data time-series and compute zonal statistics using different administrative units/levels (e.g. national, canton).
Given past activities of the team project related to portrayal interoperability with OGC standards like WMS, WMTS, SLD/SE, this part aims to challenge a set of specifications of the OGC API, especially Features, Tiles, Maps, Styles and to provide insights about OGC SymCore. At one side, the experimental case will consider outputs from running Geoclimate, an open source geospatial toolbox to compute a set of urban climate related parameters describing a given area of study using OpenStreetMap data as a base. The intent is to make these indicators discoverable and to serve them as data and maps through the OGC API. At the other side, the aim is to address national needs for geodata visualization using the Minimum Geodata Models (MGDM) in conjunction with their styling models, testing how the symbol description may be encoded in a standard way with modern formats and techniques to build styles and symbology (i.e. from SLD/SE to GeoCSS with or without cascading, etc).
For all these experimental cases, FOSS4G are deployed, especially at the server level with FROST, pygeoapi, Geoserver, QGIS Server. The results are useful for developers, government agencies and organizations who want to implement and use the new family of OGC standards.
Urban planning and design play an important role in amplifying or diminishing built environmental threats to health promotion and disease prevention (Keedwell 2017; Hackman, et al. 2019). However, there is still a lack of good evidence and objective measures on how environmental aspects impact individual behavior. The eMOTIONAL Cities project (eMOTIONAL Cities - Mapping the cities through the senses of those who make them 2021) sets out to understand how the natural and built environment can shape the feelings and emotions of those who experience it. It does so with a cross-disciplinary approach which includes urban planners, doctors, psychologists, neuroscientists and engineers.
At the core of this research project, lies a Spatial Data Infrastructure (SDI) which assembles disparate datasets that characterise the emotional landscape and built environment, in different cities across Europe and the US. The SDI is a key tool, not only to make the research data available within the project consortium, but also to allow cross-fertilisation with other ongoing projects from the Urban Health Cluster and later on, to reach a wider public audience.
The notion of SDIs emerged more than 20 years ago and has been constantly evolving, in response to both technological and organisational developments. Traditionally, SDIs adopt the OGC Ws service interfaces (e.g.: WMS, WFS, WCS), which are based on SOAP, the Simple Object Access Protocol. However, in recent times, we have seen the rise of new architectural approaches, which can be characterised by their data-centrism (Simoes and Cerciello 2021). Web-based APIs have numerous advantages, which speak for their efficiency and simplicity. They provide a simple approach to data processing and management functionalities, offer different encodings of the payload (e.g.: JSON, HTML, JSON-LD), can easily be integrated into different tools, and can facilitate the discovery of data through mainstream search engines such as Google and Bing (Kotsev et al. 2020). These APIs often follow a RESTful architecture, which simplifies its usage, while minimising the bandwidth usage. Moreover, the OpenAPI specification (OpenAPI Initiative 2011) allows to document APIs in a vendor-independent, portable and open manner, which provides an interactive testing client within the API documentation.
OGC has embraced this new approach in its new family of standards called OGC APIs (OGC 2020a). Although still under active development, it already produced several approved standards: the ‘OGC API - Features’’ (OGC 2022b, the ‘OGC API - EDR’ (OGC 2022c), the ‘OGC API Common’ (OGC 2022d) and the ‘OGC API - Processes’ (OGC 2022e) which provide standardised APIs for ensuring modern access to spatial data and processes using those data.
There are many similarities in the process of designing and implementing open source and open standards. OSGeo encourages the use of open standards, like those from OGC and there is even a Memorandum of Understanding between the two organisations (OSGeo 2012). In practice, many long-standing OSGeo projects implement OGC standards and they often contribute to the standards development (e.g.: GDAL, Geoserver, QGIS, OpenLayers, Leaflet). However, in the majority of cases they still implement the legacy Ws standards, rather than the new OGC APIs.
In the eMOTIONAL Cities project we have set out to create an SDI based on OGC APIs, but realised that we needed to support some legacy standards, because an OGC API equivalent was not widely supported yet. This has led us to create two stacks: one OGC APIs (e.g.: modern) and another one using W*s services (e.g.: legacy). Both stacks rely on FOSS/OSGeo software, and whenever relevant we have contributed to some of those projects. The modern stack includes Elasticsearch and Kibana (Elastic), which add extra capabilities in terms of searching, analytics and visualisation.
For the sake of reproducibility, all software components were virtualized into docker (Wikipedia 2022) containers and they are orchestrated using docker-compose. The results are published in the eMOTIONAL Cities public github repository (eMOTIONAL Cities H2020 Project 2021).
Despite its numerous advantages, we still see a lack of adoption of the OGC APIs within most SDIs. In part this could be due to the standards not being well known, but it could also be due to a lack of knowledge about which implementations are available out there, specially as FOSS. In this paper we would like to share our modern SDI architecture, and the reasons for choosing pygeoapi (Kralidis 2019) for publishing data as OGC API Features, Vector Tiles and Records. Although the standards we selected target the Urban Health use case, we believe they are generic enough to be useful for sharing data in other contexts (e.g.: climate change, cross-border datasets).
We are confident about a transition to OGC APIs, but we are also conscious that this may take time, and for a period of time many solutions will have to offer both modern and legacy standards.
Please find the complete list of references on this page: https://github.com/emotional-cities/foss4g_ref/blob/master/references.md
It is said that data visualization is as important as the data itself. As the amount of data generated from Earth observation (EO) satellites – i.e. Copernicus program (Jutz and Milagro-Pérez, 2020) – is getting bigger and bigger, we need more efﬁcient tools to deal with this onslaught of data. To help data scientists better extract relevant information from datacubes, we noticed that an under-exploited computer graphics tools could bring new perspectives to specialists. Datacubes are known to be the reference format to handle EO data; several techniques such as Web WorldWind developed by NASA exist to process and interact with them. Recent works have shown focus on the preparation of largescale geospatial data (Mazroob Semnani et al., 2020), a highly technical subject, could beneﬁt from optimizations. QGIS is another tool frequently used in the field, that can be enhanced by plugins and can retrieve data from Web platforms. A modern approach to process efficiency is the use of GPUs. Still, when reviewing the use of GPUs to process geospatial data, the emphasis is often put on the parallel processing of geospatial datasets rather than focusing on their visualization (Saupi Teri et al., 2022).
One of the main contributions of this paper is to consider geospatial data using GPU resources for intermediate computation and visualization. Considering the increasing interest to interact with this data directly using Web pages or Notebooks, this article presents tools allowing a program to run on the GPU and display the desired datacubes using the WebGL API. This can result in high performances thanks to its low-level control and possibility to use GPGPU algorithms. WebGL running natively on most web browsers, another beneﬁt will be the end-user ease of use. The end goal is to display even large (i.e. 1024^3) datacubes rendered on the ﬂy in real time on a PC, still well-equipped.
To keep our applied research efforts focused, we have set up an independent international expert advisory group. Indeed, we wanted above all to provide something useful and concrete for the actors in the field. The represented institutes are ESA (EC), EURAC (Italy), GISAT (Czech Republic), Terrasigna (Romany), TU Wien (Austria), VITO (Belgium), and even a former NASA (USA) analyst. They have been regularly interviewed to get constant feedback on the suitability of our developed application, the ﬁnal goal of our project being to build a toolbox of models to efﬁciently visualize different EO datacubes formats.
This paper presents three main models applicable to datacubes from an EO context, some relatively standard and others innovative, still all revisited via the GPGPU architecture.
Implicit curves model – This model has two main approaches: discrete and math-based sub-models. Especially adequate to process (x, y) or (x, y, t) datacubes in a 2D or 3D visualization we developed and compared both sub-models with their dependencies. Sets of given iso and δ values are extracted from the data and stored as sets of curves. They can be displayed in a 2D environment or in 3D with additional information such as: (1) the simulation of the data as a 3D surface; (2) different colormaps for the surface representing yet other external data; (3) surface render in steps to emphasize the given iso and δ values; (4) user customizable colormaps; and (5), a water level simulation rendering.
Derivative 3D rendering model – This model is specialized in analyzing (x, y, t) datacubes as a volume where the time t is part of the visualization. Indeed, the aim is to visualize the evolution in time of a geographical area by highlighting the temporal differences within a volume. After selecting the (x, y) region of interest the user selects a reference layer representing the state of an area at a deﬁned time t and a time interval Δt. The cumulated differences between the two are visible in a colored sub-volume deﬁned by the time interval. In order to add more contextual information in the visualized geographical area, we have added the possibility to display an additional map (such as topographic data) at the reference layer level within the volume.
Jupyter Notebook massive rendering model – To make the toolset even easier to use, we have developed a visualization model deployable in Jupyter. This model allows rendering of (x,y,z) and (x,y,z,t) data volumes. Two rendering algorithms are already available: (1) the implicit surface simulation for any iso intensity -- but only via the discrete approach -- and (2), an XRay-cast simulation.
Results show our models can process large amounts of data and render them in real-time. Where large 3D datasets would normally become problematic to handle for any GPU, we developed specialized tools to overcome software and hardware limitations. For instance, a 3D datacube can be sorted into a 2D texture to be directly loaded into GPU memory, thus improving performance. When the textures become too big to work with WebGL, their information can be split in the RGBA channels of standard 2D textures for a four-fold decrease in memory use. Furthermore, when displaying our rendering models, and in the case of machines without sufficiently powerful graphics cards, we propose to display only the fraction of the data that interests the user. All of these highly efﬁcient rendering models are assembled together in a toolbox dedicated to datacube visualization.
In this paper we demonstrate an example of application retrieving raw data from a server, formatting it for local use with GPGPU, and rendering it with several innovative models. We developed these tools for a Web application and Jupyter Notebooks to better fit with the needs of data scientists. To better understand the scope of the possibilities of this work, several illustrations are available here: https://bit.ly/3th8JBF. Finally, we would also like to point out that this work has been granted by national funds, therefore our development is open and does not have any external software dependencies.
Lake Maggiore and the Ticino River are water bodies shared by
Italy and Switzerland: they are important resources for drinking
water, irrigation and hydroelectricity generation as well as for
tourism and biodiversity. The cross-border character and the
often conflicting needs of the different users make the shared
management of this resource very complex, but of great importance.
The `‘Parchi Verbano Ticino´’ project, funded by Regione
Lombardia / EU – INTERREG Italia Svizzera 2014/2020, aims
to study the effects of water levels of the lake on various environmental
components with a particular focus on protected
natural areas. The level of the lake is regulated by a dam located
at the southern shore of the lake. In this framework, this
study aims to analyse the effect of water level on bird migration
by: 1) Calculate the inundated bird habitat using a simulation
based on measured water level; 2) calculate the inundated habitat
from Sentinel-1 remote sensing imagery 3) Use the flooded
area derived from S1 as ground truth to validate the previous
The study area is centerend around Bolle di Magadino (Switzerland,
8°51’56.90”E, 46°9’42.17”N, a protected wetland located
on the north shore of lake Maggiore at the confluence with the
Ticino river. The area is a recognized nesting and stopover
site for birds, listed as a Ramsar Wetland of International Importance
and as Important Bird and Biodiversity Area (IBA).
We defined the habitats of interest using
a vegetation map provided by Fondazione Bolle di Magadino.
The vegetation types collected from a phytosociological field
study were aggregated into ten land cover classes that described
the habitat types and land use. The final habitat map covers
an extent of 6.7 km², including the 1500 ha of wetland called
Bolle di Magadino. Daily passage of migrant birds have been
recorded at Magadino ringing station and since 2019, traditional
net captures were coupled with an Avian Vertical-looking Radar
In this study we focus on the following periods, during which
bird monitoring systems were both deployed: P1: 2019-05-01–
2019-06-20; P2 2019-10-01–2020-02-20 and P3 2021-02-01–
The lake level measured at the hydrological station of Locarno
(CH) was used to determine the inundated area in GRASS GIS
using ther.fill module (GRASS Development Team, 2022) and
a DTM that included the lake bathymetry (cell size 0.5 m). The
lake level fluctuated between 192.3 and 194.9 m.a.s.l. over the
study period, with a minimum in April and May, when the waters
are used to irrigate the rice fields downstream, and a maximum
in late autumn.
We used the Google Earth Engine Platform (GEE) (Gorelick et
al., 2017) to extract Sentinel-1 Synthetic Aperture Radar (SAR)
images (ESA, 2021) as they are suited for surface water mapping
and not affected by cloud coverage (Ovakoglou et al., 2021).
We used Edge Otsu Algorithm with terrain correction (Markert
et al., 2020) to estimate the inundated areas of the collection of
a total of 236 images for the three time periods when bird migration
was also monitored. The calculation was implemented
GEE with the approach described by Gorelick et al. (2017) calibrating
the threshold for our study area, adapting the code provided
by Open Geo Blog (2021). The inundated areas were then
overlapped with the land use map in order to estimate the extent
of the submerged vegetation over the three time period defined.
The resolution of all maps was 10 m, except the DTM that has
0.5 m, the CRS used in this work were WGS84 in GEE and the
local CRS GCS CH1903 for all the other analysis.
The area covered with water, according to S1, varied between
103.9 and 471.7 ha (210+- 83.6), some of the surface is a permanent
wetland so it is never completely dry. Each habitat was
affected differently by the flooding: when the water was at its
highest, croplands were completely inundated, grasslands and
reeds were submerged for 80% of their extent whereas urban
areas and infrastructures were not affected (less than 1% underwater).
The flooded area calculated by filling the terrain model
at the level of the ranged from 130.4 to 248.90 ha (208+- 48.6).
The correlation between inundated areas obtained using r.fill in
GRASS and S1 on the same dates was fair for P1 and P2, but
not for P3, when the water of the lake were taken for irrigation,
but the habitats were flooded by rainfall. An interpolation of the
flooded from S1 is a more efficient way to obtain an estimate of
the flooded habitat on a daily basis, that is necessary to study
its effect on migratory birds.
The results presented here will contribute to the definition sustainable
management tools of water management of lake Maggiore
taking into account the effect of lake level on biodiversity
in general and on bird habitat in particular.
Land surface temperature (LST) in urban areas is an important environmental variable considered a reliable indicator of the urban heat island (UHI) phenomenon. LST is affected by various factors such as solar irradiance, cloudiness, wind or urban morphology. Traditionally, LST is observed and recorded by thermal remote sensors. For example, thermal satellite sensors are very popular for assessing the UHI effect on a global scale such as MODIS, Sentinel 3, ASTER, Landsat 7 ETM+, or Landsat 8 TIRS. However, these sensors provide rather low spatial (60 m to 1000 m) and temporal resolutions (several hours to days) of satellite observations that limit the accurate estimation of LST in urban areas for local studies and specific time periods (Mushore et al., 2017), (Hu and Wendel, 2019). Airborne or terrestrial remote sensing can be viewed as another option for capturing higher spatial resolution of thermal data but it is not feasible to be used for large urban areas with increased periodicity. However, the increasing availability of the high-resolution geospatial data and adequate modeling techniques provide an alternative approach to high-resolution estimation of LST in urban areas.
Several studies showed the potential of geographic information system (GIS) tools, digital surface models (DSM) and 3-D city models for the estimation of solar radiation in urban areas (e.g., Hofierka and Kaňuk, 2009; Hofierka and Zlocha, 2012; Freitas et al., 2015; Biljecki et al., 2015). Solar irradiance is a key factor affecting LST during daylight periods, especially under clear sky situations. Nevertheless, LST assessment requires a physical model combining surface-atmosphere interactions and energy fluxes between the atmosphere and the ground. Properties of urban materials, in particular, solar reflectance, thermal emissivity, and heat capacity influence the LST and subsequently the development of UHI, as they determine how the Sun’s radiation energy is reflected, emitted, and absorbed (Hofierka et al., 2020b; Kolečanský et al., 2021). It is clear, that the problem complexity requires a comprehensive GIS-based approach.
Our solution is based on open-source solar radiation tools available in GRASS GIS, a 3D city modeling and spatially distributed data representing thermal properties of urban surfaces and meteorological conditions (Hofierka et al., 2020a, 2020b; Kolečanský et al., 2021) . The proposed LST model is calculated using the methodology implemented in GRASS GIS as a LST module written using a script (shellscripts, Python). In these scripts, the r.sun and v.sun solar radiation models in GRASS GIS were used to calculate the effective solar irradiance for selected time horizons during the day . The solar irradiance calculation accounts for attenuation of beam solar irradiance by clouds estimated by field measurements. The proposed LST model also accounts for a heat storage in urban structures depending on their thermal properties and geometric configuration. The 2D LST model uses the output of the r.sun solar radiation model and a DSM representing urban surfaces and the 3D LST model uses the output of the v.sun solar radiation model and a vector-based 3D city model. Computed LST values for selected urban surfaces were validated using field measurements of LST in 10 locations within the study area with acceptable accuracy. The proposed approach has the advantage of providing high spatial detail coupled with the flexibility of GIS to evaluate various geometrical and land surface properties for any daytime horizon. The methodology can be used for evaluation of proposed UHI mitigation measures such as increasing albedo of urban surfaces or expanding green areas including green roofs and trees.
Biljecki, F., Stoter, J., Ledoux, H., Zlatanova, S., Çöltekin A., 2015. Applications of 3-D city models: State of the art review. ISPRS International Journal of Geo-Information, 4, 2842–2889. https://doi.org/10.3390/ijgi4042842.
Freitas, S., Catita, C., Redweik, P., Brito, M. C., 2015. Modelling solar potential in the urban environment: State-of-the-art review. Renewable and Sustainable Energy Reviews, 41, 915–931. http://dx.doi.org/10.1016/j.rser.2014.08.060.
Hofierka, J., Bogľarský, J., Kolečanský, Š., Enderova, A., 2020a. Modeling Diurnal Changes in Land Surface Temperature in Urban Areas under Cloudy Conditions. ISPRS Int. J. Geo-Inf., 9, 534.
Hofierka, J., Gallay, M., Onačillová, K., Hofierka, J. Jr., 2020b. Physically-based land surface temperature modeling in urban areas using a 3-D city model and multispectral satellite data. Urban Climate, 31, 100566.
Hofierka, J., Kaňuk, J., 2009. Assessment of photovoltaic potential in urban areas using open-source solar radiation tools. Renewable Energy, 34, 2206–2214. https://doi.org/10.1016/j.renene.2009.02.021.
Hofierka, J., Zlocha, M., 2012. A New 3-D Solar Radiation Model for 3-D City Models. Transactions in GIS, 16, 681–690. https://doi.org/10.1111/j.1467-9671.2012.01337.x.
Hu, L., Wendel, J., 2019. Analysis of urban surface morphologic effects on diurnal thermal directional anisotropy. ISPRS Journal of Photogrammetry and Remote Sensing, 148, 1–12. https://doi.org/10.1016/j.isprsjprs.2018.12.004.
Kolečanský, Š., Hofierka, J., Bogľarský, J., Šupinský, J., 2021. Comparing 2D and 3D Solar Radiation Modeling in Urban Areas. Energies, 14, 8364.
Mushore, T.D., Odindi, J., Dube, T., Matongera, T.N., Mutanga, O., 2017. Remote sensing applications in monitoring urban growth impacts on in-and-out door thermal conditions: A review. Remote Sensing Applications: Society and Environment, 8, 83–93. https://doi.org/10.1016/j.rsase.2017.08.001.
Introduction: Legally defined appellation areas are used by governments throughout the world to demarcate geographic areas that produce agricultural products, such as wine, cheese, or preserved meats, with a specific quality or set of characteristics. In the United States, the American Viticultural Areas (AVAs) define wine growing areas that are distinctly different from others. These boundaries are created by the US Alcohol and Tobacco Tax and Trade Bureau (TTB) through a legal process and the definitions are published in the United States Federal Register in narrative form defined using United States Geological Survey (USGS) topographic maps for their landmarks. Despite their geographic definition, a full spatial dataset of these boundaries following the legal definitions did not exist until they were created by a team of researchers led by the University of California Davis’ (UC Davis) library. The purpose of the dataset is to produce open data suitable for use in research and cartography following a well-documented set of methods that represents the official boundary descriptions with as high fidelity as possible. Using the UC Davis AVA dataset alongside datasets defining environmental characteristics such as soils, climate, and elevation, we seek to understand how the characteristics present within the AVA boundaries are similar to each other using a hierarchical clustering process. Through this case study, we will describe the UC Davis AVA boundary dataset and demonstrate a use case for the data.
Data: The UC Davis AVA dataset was created by digitizing the boundary narrative onto the USGS topographic maps described in the legal documents (officially known as the “approved maps”) for each AVA by a team of collaborators at UC Davis, UC Santa Barbara, and Virginia Tech University, as well as community volunteers. For each boundary, we recorded attributes including an identifier, the official name of the AVA, any synonyms for the name, the dates the AVA officially was recognized, the start and end date for the given polygon, who petitioned to define the AVA, which TTB staff member wrote the official documents, the list of approved maps, the list of maps used to digitize the boundary (to record any necessary substitutions), and the official boundary description. In addition to the currently defined boundaries, we also created a boundary polygon for the previous iterations of any boundaries that have undergone revisions. The dataset is stored in geojson format in a publically available GitHub repository and updated as AVAs are created or amended.
For each AVA, we summarized the environmental data over the area of the polygon. The PRISM dataset (from Oregon State University) provided the climate data (30-year climate normals for precipitation and temperature) and elevation data in raster format with an 800m cell size. For each variable, we calculated the mean and the range within the AVA boundaries.
We also plan to expand this analysis over the coming weeks to include additional environmental characteristics available from PRISM, such as vapor pressure and solar radiation that would be important considerations for grape growth, as well as soil data from the United States Department of Agriculture’s (USDA) SSURGO (Soil Survey Geographic) soil dataset. SSURGO is a spatially-enabled dataset of soil characteristics for the United States. It includes geologic soil series names as well as the soil’s chemical attributes.
Analysis: For each attribute, the value at each AVA was assigned a z-score, calculated as the mean of the attribute field subtracted from the value and divided by the standard deviation of the field. This was done to normalize the data and reduce the effect of differing scales of measurements (for example, depth of precipitation compared with temperature in degrees Celsius). To assess how similar any given AVA is to other AVAs, we performed a hierarchical clustering analysis using R’s hclust() hierarchical clustering function. This tool uses a dissimilarity matrix to assign each polygon to a hierarchical series of groups based on how similar (or dissimilar) each polygon is to each other. The results can be displayed in a dendrogram to visualize the structure of the classes. The classes can also be used to create a map of the AVAs to help interpret the groups.
Results: Preliminary results group AVAs into clusters that appear to be somewhat based on geographic regions, but not entirely. When the dendrogram is cut into 6 groups, the AVAs in the eastern half of the country primarily fall into one group, however, the western AVAs comprise the remaining 5 groups. This could be driven by the higher degree of variation in elevations, precipitation, and temperature in the west. In the southwest, the AVAs appear to correspond to one group, however, the west coast states have many groups, including some AVAs that correspond with the eastern group. Expanding the analysis to include additional environmental factors will likely clarify some of these groups, perhaps defining more variation in the east. This paper will include maps and diagrams that clearly show the relationships between the groups.
Discussion: Investigating the relationship between the AVA boundaries is an important exercise. With the availability of the AVA boundaries as a geographic dataset, we are now able to combine this data with other existing open datasets to better understand the relationship and differences between these areas. All of the datasets used in this analysis are freely available and demonstrates not only the usefulness of the UC Davis AVA dataset but also the depth of the work possible with open data. This particular exploration builds on work I have published with colleagues investigating the Sierra Foothills AVAs in the state of California and the emerging wine growing region in the state of Arizona.
In the last years we have witnessed a huge increase in the availability of free and open multispectral, multitemporal and global coverage satellite imagery. At the same time, new open software tools for exploiting these images have arisen. Given the availability of short-revisiting time open satellite images, this study focuses on the analysis of satellite imagery using free and open source GIS software to identify displacements of single landslides.
In particular, the Ruinon landslide was selected as the subject for this analysis. It is situated in Northern Lombardy, Italy, and it is one of the most active landslides of the Alps. The landslide is situated at the base of a Deep-seated Gravitational Slope Deformation, that affects the entire slope up to the summit at 3000 m a.s.l. Two major scarps can be identified: the upper one is a sub-vertical rock cliff of about 30 m in height, while the lower one is characterized by a more widespread debris cover.
The general strategy employed in this work for obtaining landslide displacements in terms of direction and magnitude is to apply a local maximum cross-correlation on a multitemporal images stack. This was achieved using GRASS GIS and custom Python scripts.
The images were selected from both the Sentinel-2 catalogue, which is free, and the Planet catalogue, available for free for research purposes.
The main preprocessing steps are: creation of a suitable multi-temporal stack, clipping the satellite images to the selected AOI and applying cloud masking and an atmospheric correction; image co-registration to ensure that the images become spatially aligned so that any feature in one image overlaps as well as possible its footprint in all other images in the stack; histogram matching to transform one image so that the cumulative distribution function (CDF) of values in each band matches the CDF of bands in another image.
The main processing is based on the Maximum Cross-Correlation method implemented on couples of images. The first image of the couple will be refererred to as reference image, and the second one as secondary image. This algorithm was previously applied to land cover changes (You et al., 2017) and to the movement of desert sand dunes (Oxoli et al., 2020). In the developed procedure, the processing phase starts by placing a window in the same position of both images. The window on the secondary image is then shifted in all directions, and a cross-correlation coefficient is computed for each of the shifts. The shifted window with the highest cross-correlation coefficient is selected, and a displacement vector is computed between the center pixel of the reference image window and the center pixel of the new shifted window of the secondary image.
The outputs are shifts (in pixels) in X and Y directions which are actually the distances required to register the window of the secondary image with the one of the reference image.
It is important to note that the smallest displacement that can be identified by this procedure is a displacement of 1 pixel, i.e. a displacement of 10 m if considering Sentinel-2 data. Therefore, smaller movements cannot be sensed by because of the native resolution of input satellite data. Secondly, errors can arise from the images having differences in terms of co-registration and histogram distribution, since this process highly relies on the images being as aligned and similar as possible.
For monitoring the activity of the Ruinon landslide, two different sets of images were considered. The first one consists of one image per year in the period 2015-2020, with the idea to track the evolution of the landslide throughout the last few years. Since the landslide is situated in a mountainous region, it is often covered by clouds, and in the winter months by snow. Because of this, only the best image for each year was selected for the analysis. The other set is composed of three images, one per month, in the period July 2019 - September 2019, aiming at highlighting a large movement that took place in the summer of 2019.
To compare and evaluate the performances of the cross-correlation approach, data coming from UAV surveys (provided by the local environmental agency ARPA Lombardia) of the landslide were used. At first, the results obtained with the procedure were compared with the output given by the procedure when applied to RGB images obtained from the surveys, which have a resolution of 1m. The two outputs were found to be very similar, both for the displacement magnitudes and directions. Secondly, photogrammetric point cloud comparisons created from the UAV observations in periods close to the considered ones for satellite monitoring were investigated. In particular, the displacement along the vertical axis was inspected, and accumulation zones were found in correspondence to the largest movements of the landslide detected from the algorithm. Because of this, the results were considered consistent with the data of the surveys.
The increased availability of high-resolution multitemporal satellite imagery promotes the use of these images for monitoring purposes. While on the field monitoring can produce very accurate results, a procedure like the one applied in this work has the advantage to be more flexible, scalable and cost-effective than an analysis on the field. The experimental procedure developed in this work led to promising results, despite being a first stage approach to landslide monitoring applying the maximum cross-correlation method. Many approaches were considered, varying the main parameters of the procedure (adding or removing a classification phase, considering different intervals between satellite images, modifying the size of the moving window and others), and the whole process was progressively improved and refined until satisfactory results were achieved.
This article is a work in progress report on the introduction and exploitation of persistent identifiers (PID) within the OSGeo Foundation and its software project communities. Following an introduction to the topic of Persistent Identifiers (PID), an overview of the currently achieved states and emerging new opportunities, but also new challenges is given. The latter enables the OSGeo project communities to actively participate in the further development of data-driven open science and the evolution of the FAIR (Findable, Accessible, Interoperable, Reusable) Guiding Principles for scientific data management and stewardship from the original data focus to research software and community software projects. With the rise of the Internet and World Wide Web, Universal Resource Locators (URL) have become common practice to reference web resources. A URL specifies its location on a computer network and a mechanism for retrieving it. However, URLs are not a sustainable practice for scientific citation because they will break once the referenced resource is transferred to another web address; i.e., the original URL cannot be resolved anymore and an error message is returned instead (e.g., HTTP error 404). To counter this, persistent identifiers have been introduced as long-lasting references to web resources, including research data, source code, audiovisual content, and also human individuals or communities. Persistence is always achieved by infrastructure services which resolve the references to their target objects. This requires open standards, operation of infrastructure services and best practices for sustainable long term use. The adoption of PID use in the OSGeo Foundation continues for different application areas, with increasing synergy effects forming the foundation of a greater whole. The introduction of PID in OSGeo started in 2014 for a newly discovered version of the historical GRASS GIS informational video from 1987, which is preserved in the AV Portal of TIB Hannover (https://av.tib.eu/) and can be accessed through a permanent Digital Object Identifier (DOI) (https://doi.org/10.5446/12963, https://doi.org/10.5446/31768). Since 2016, OSGeo conference videos have been collected as a permanent service in the AV Portal, with the collection growing by approximately 100 hours of video recordings annually (pre-Covid). In 2017, the rasdaman software project registered a DOI for the first time for release version 9.4.2 in the Zenodo data repository (https://doi.org/10.5281/zenodo.1040170). Zenodo is a general-purpose open-access repository operated by the European Organization for Nuclear Research (CERN) since 2015. In 2019, the next DOI registration followed for the GMT software project for release version 6.0.0 (https://doi.org/10.5281/zenodo.3407865). Further improvements of the technical integration of project software repositories hosted on the GitHub platform and Zenodo have enabled a simplified handling of software versioning: When registering a DOI as a PID for a software project, at least two references are created, which are linked to each other: The Concept DOI, which represents the software project as a higher-level intellectual construct, and an initial Version DOI, which references a specific software release. With the integration now available between GitHub and Zenodo, the successive creation of additional Version DOI for upcoming new software releases can be done automatically. Since 2021, the number of DOI registrations by OSGeo software projects has increased significantly. Currently, DOIs are already available for 19 software repositories related to OSGeo projects (https://wiki.osgeo.org/wiki/DOI). More than half of the official OSGeo software projects can already be referenced by means of DOI. All projects that have registered a DOI have chosen an official scheduled release to initiate DOI versioning. Equipping OSGeo projects and content with PID results in significant added value for scientific users, but also for the respective project communities. Well formatted citations for software project DOI can be conveniently generated in thousands of different citation styles by online citation services (e.g. https://citation.crosscite.org/). Citation of OSGeo projects is already actively used in scientific publications (e.g. Springer Handbook of Geo Information, 2nd Ed. https://doi.org/10.1007/978-3-030-53124-9, in print). The metadata of a PID for data and software can also reference PIDs for the authors and others involved. As a result, it is now possible that once the Version DOI of a software release is cited, the involved persons can also be referenced using an individual PID, such as the Open Researcher and Contributor ID (ORCID), and receive measurable scientific credit for their effort. This allows that that collaboration efforts in FOSS software projects will become a measurable and rewarding part of the scientific track record. Furthermore, PID of software, data and other information sources can be related to each other by specifying related persistent identifiers in the metadata. This field is currently undergoing rapid development. A further step will be the linking of the now available concept and version DOI of the OSGeo projects with the PID of the OSGeo conference videos, which will improve the discoverability and re-use of the conference contributions.The OSGeo Foundation can be understood as a growing continuum of software projects, functionalities, groups of people, but also knowledge and information.
Providing an up-to-date mapping of internal linkages and dependencies of the OSGeo continuum has not been satisfactorily solved yet. In the past, there have been several approaches (e.g. http://pathfinder.terrasigna.com/oss/index2.html or https://doi.org/10.5446/14652), which have remained snapshots due to the lack of persistent references to the described objects and manual maintenance for regular updates. The availability of PID for software and persons creates a stable base for this for the first time, seconded by the conceptual approach of an integrated PID-based graph, which was developed in the FREYA project (https://www.project-freya.eu/). This approach models resources which are identified by PIDs (software projects, data, publication, persons) and the connections between them in a graph of the network of interconnected PID systems, based on their PID metadata.
The need to make electricity production increasingly sustainable requires careful planning of production plants, mainly for wind and photovoltaic energy conversion. Planning areas correctly, while respecting existing environmental constraints, is not an easy task and requires the collaboration of a panel of experts with different skills.
The need to search for new sites to be allocated to renewable energy generation plants is dictated by the most pressing current events, the search for non-impacting energy sources to whose research and development specific points of the National Resistance and Resilience Plan are dedicated, to which are added the consequences of the newborn Ukrainian conflict that has definitively discovered the problematic relationship-dependence of Italy and Europe with energy supplies from non-European countries. Both issues are pushing the country towards a rapid search for new energy strategies for environmental reasons and to make up for natural shortages that require massive imports of gas and other resources from abroad.
In particular, the National Recovery and Resilience Plan (PNRR), part of the European Next Generation EU (NGEU) programme, a 750 billion euro package allocated by the European Union to counteract the economic damage caused by the Covid-19 global pandemic, is an economic plan worth 248 billion euro that Italy can use in the five-year period from 2021 to 2026 to implement various reforms and repair the damage created by the pandemic crisis.
The plan, presented to the EU under the name 'Italia Domani', envisages investments along three main axes: digitalisation and innovation, ecological transition and social inclusion. These economic interventions are intended to resolve the drama caused by the advent of the Sars-Cov2 virus and help solve structural problems in the Italian economy, accompanying the country towards a path of ecological and environmental transition. It also aims to resolve important issues such as territorial, generational and gender gaps.
It is in this context that the national legislation is undergoing a revision, which has entrusted the regional administrations with the task of identifying the territorial criteria that favour or prevent the establishment of certain plants in the various areas of the territory. Each regional administration has the right to graduate the criteria according to the specific geomorphologic characteristics of its own territory and therefore the most efficient procedure would be to verify, with simulations in GIS environments, the effect of defining certain criteria on the territory to assess in advance which and how many areas could have greater or lesser suitability. On the basis of this consideration, we proceeded to experiment with the effects of the most common constraints by developing a real simulation on the territory of the Lazio Region.
The experimentation used the well-known open environment QGIS 3.22, which made it possible to exploit the possibilities offered by the open territorial databases of the Lazio Region.
It should be noted that the Lazio Region (like most Italian regions) has made many spatial data available in open format in recent years. The European directive called "Inspire" gave a boost to the use, standardisation and free dissemination of spatial data. It provides for the creation of a Community data infrastructure that simplifies the sharing between public administrations and user access to spatial information. In Italy, the directive was transposed into Italian law by Legislative Decree no. 32 of 27 January 2010, which established the National Infrastructure for Spatial Information and Environmental Monitoring as a node of the Community infrastructure. As a result of this implementation, the National Geoportal was created, which was followed by the various Regional Geoportals, such as that of the Lazio region.
The implementation of the open data in QGIS 3.12 made it possible to identify topological inaccuracies in the files provided and shared on the Lazio Region site, which led to necessary decisions such as the correction of some polygons that presented errors, such as their overlapping or imperfect closure (the correction of the latter case was suggested by QGis itself, through the "reopen geometries" function).
A possible inaccuracy was also found in the "lowland species" and "mountain species" files of the Regional Ecological Network, which seems to show an error in the transcription of the relative geodetic datum on the website, where it is reported as WGS84, UTM33N and resulting from verifications and overlapping more likely ED50 . Uncategorised areas also emerged in the file called "PTPR Regione Lazio (Tav. A - Tav. B)" which were however excluded from those considered to be unsuitable since a more detailed analysis revealed the area of the Parco della Caffarella in Rome, which is hardly conceivable as the site for a wind farm or large-scale photovoltaic plant. Extraterritorial areas within the Region's territory belonging to the State of "Vatican City" were also added as "unsuitable".
The first results show that the remaining areas after eliminating all those that are certainly unsuitable are a limited part of the Region itself. It should be noted that these areas are not definitely suitable areas but those that are not unsuitable or potentially suitable, even if further investigation is required to ascertain their suitability.
The limited extent of the areas remaining after the exclusion of the unsuitable areas suggested that we make an initial estimate of the sustainability of a total conversion to these energy sources for the whole region in order to assess its potential energy autonomy.
The analysis was extended to individual municipalities by comparing average yields per conceivable plant area, then comparing them with inhabitants for an initial estimate of energy needs at least for domestic use.
In this section, we describe the main routines of the MSPA code with reference to the morphological image analysis operations they rely on with links to their implementation in the open source Morphological Image Analysis Library (MIAL) recently released on GitHub at github.com/ec-jrc/jeolib-miallib by the first author. All morphological image analysis operators at the basis of MSAP are described in [Soille, 2004]. We briefly present the main MSPA foreground classes with reference to source code of the main morphological function used to compute them: core, boundaries, islets, connectors, and branches. The actual pseudo-code will be added in the final version of this paper and will include details on the computation of all MSPA feature classes including those of connected components of background pixels. The underlying code in the C programming language is available on GitHub at github.com/ec-jrc/jeolib-miallib/blob/master/core/c/mspa.c
The performance of the algorithm is evaluated on images of increasing size as well as for on-the-fly computation for interactive analysis and visualisation. We demonstrate experimentally that the complexity of the proposed implementation is linear. That is, the computational time increases linearly with the number of pixels. We also show that the algorithm can handle images up to 2^64 pixels. For example, a Global MSPA map of forest cover in equal area projection and with a pixel resolution of 100 meter (400,748 x 147,306 pixels) was processed on the JRC Big Data Analytics Platform [Soille et al. 2018] in 12 hours. Processing large images is very much needed to mitigate dependencies with regards to the image definition domain because pixel classes may depend on the observation domain.
As for the on-the-fly computation for interactive analysis and exploratory visualisation based on Jupyter notebooks [De Marchi and Soille, 2019], we show that the proposed implementation is fast enough for integration in JupyterLab with on-the-fly computation in an area corresponding to the mapview area and at resolution matching its zoom level. A Voila dashboard is in preparation and will be available for demonstration at the conference.
Morphological spatial pattern analysis has gained traction since its inception in 2008. For many years, we maintain a dedicated MSPA website with extensive documentation, various GIS extensions and a user-friendly provision of MSPA within the desktop application GTB and the server application GWB. The present open release of MSPA will further expand the potential user-community. We are in the process of making MSPA directly available in the pyjeo python package [Kempeneers et al., 2019], so that data scientists using python for their analysis will directly benefit from the MSPA open source release. Since MSPA is available through a library compiled in C, it can be easily integrated in other data science environments. We therefore expect the release of the MSPA code under an open source license to further boost its use for the analysis of geospatial patterns and indeed any other types of spatial patterns occurring in other scientific domains.
- Soille, P., Vogt, P. "Morphological segmentation of binary patterns" (2009) Pattern Recognition Letters, doi: 10.1016/j.patrec.2008.10.015
- Soille, P. et al. 2018. "A versatile data-intensive computing platform for information retrieval from big geospatial data" Future Generat. Comput. Syst. doi: 10.1016/j.future.2017.11.007
- Ossola, A. et al. "Yards increase forest connectivity in urban landscapes" Landscape Ecol 10.1007/s10980-019-00923-7
- Julien Carlier et al. "Using open-source software and digital imagery to efficiently and objectively quantify cover density of an invasive alien plant species" Journal of Environmental Management doi: 10.1016/j.jenvman.2020.110519
- Victor Rincon et al. "Proposal of new Natura 2000 network boundaries in Spain based on the value of importance for biodiversity and connectivity analysis for its improvement" Ecological Indicators doi: 10.1016/j.ecolind.2021.108024
- Giuseppe Modica et al. "Implementation of multispecies ecological networks at the regional scale: analysis and multi-temporal assessment" Journal of Environmental Management, Volume 289, 2021, doi: 10.1016/j.jenvman.2021.112494
- Vogt, P. and Riitters, K. "GuidosToolbox: Universal digital image object analysis (2017) European Journal of Remote Sensing" doi: 10.1080/22797254.2017.1330650
- Peter Vogt et al. "GuidosToolbox Workbench: spatial analysis of raster maps for ecological applications" (2022) Ecography. doi: 10.1111/ecog.05864
- Soille, P. "Morphological Image Analysis: Principles and Applications" (2004). Springer, doi: 10.1007/978-3-662-05088-0
- D. De Marchi and P. Soille, "Advances in interactive processing and visualisation with JupyterLab on the JRC big data platform (JEODPP)", in Proc. of BiDS'19, 2019. doi: 10.5281/zenodo.3239239
- Kempeneers, P. et al. "pyjeo: A Python Package for the Analysis of Geospatial Data" ISPRS Int. J. Geo-Inf. 2019. doi: 10.3390/ijgi8100461
Image semantic segmentation focuses on the problem of properly separating and classifying different regions in an image depending on their specific meaning or use, e.g. belonging to the same object. It is worth to notice that in general segmentation is a ill posed problem: it is not possible to provide a unique solution to such problem, different solutions can typically be acceptable, depending on the segmentation criterion which is applied. Nevertheless, regularization techniques are typically used to reduce the issues related to ill posedness, hence ensuring the computability of a unique solution. In the case of semantic segmentation, ill posedness is also reduced by the specific data and object interpretation that shall be included in the semantic part of the data.
It is also worth to notice that image semantic segmentation tools can be useful in many several applications, related both to the interpretation of images themselves, but also of other entities related to such images. The latter is for instance the case of a point cloud, whose objects and areas are also described by some images. In this case, a proper image semantic segmentation could be back projected from the images to the point cloud, in such a way to exploit such process to properly segment the point cloud itself.
Automatic image semantic segmentation is a quite challenging problem that nowadays is usually handled by taking advantage of the use of artificial intelligence tools, such as deep learning based neural networks.
The availability of reliable image segmentation datasets plays a key role in the training phase of any artificial intelligence and machine learning tool based on the image analysis: indeed, despite artificial intelligence tools can currently be considered as the state of the art method in terms of recognition and segmentation ability, they do require a huge size learning dataset in order to ensure reliable segmentation results.
The developed graphical user interface aims at supporting the semi-automatic semantic segmentation of images, hence easing and speeding up the generation of a ground truth segmentation database. Then, such database can be of remarkable importance for properly training any machine or deep learning based classification and segmentation method.
Despite the development of the proposed graphical user interface has been originally motivated by the need of easing the process of producing a ground truth segmentation and classification of plastic objects in maritime and fluvial environments, within a project aiming at reducing plastic pollution in rivers, the developed tool can actually be used in contexts that are more general.
Indeed, the interface supports in particular two types of quite specific operations: 1) segmenting and identifying objects in a single image, 2) exporting previously obtained results in new images, while also enabling the computation of certain related parameters (e.g. navigation related, such as tracking the same object over different data frames). Different types of images are supported: standard RGB, multispectral images (already available as TIFF (Tagged Image File Format) images) and thermal ones.
For what concerns the semantic segmentation of a single image, several alternative segmentation options are supported, starting from manual and going to semi-automatic segmentation methods. First, the manual segmentation of the objects is ensured by means of properly inserted polylines. Then, intensity based and graph based methods are implemented as well. On the semi-automatic side, two tools are provided: a) a machine learning based method, exploiting few click choices by the user (implementing a rationale similar to that in (Majumder et al., “Multi-Stage Fusion for One-Click Segmentation”, 2020), i.e. aiming at minimize the user input), b) when images are periodically acquired by a UAS, at quite high frequencies, two successive frames are expected to be not that different from each other. Consequently, the system aims at determining the camera motion between different frames, and using machine learning tools to properly extend and generalize the results in the previous image to those of the new one.
The latter method opens to a wider scenario, where some more information may come by the availability of consecutive frames. In particular, such additional information that could be determined by properly analyzing consecutive frames could be used to: assess and track the UAS movements while acquiring the video frames, increase the automation in the segmentation and classification process of an object.
Overall, the developed graphical user interface is expected to be useful to support the semi-automatic identification of objects, and to help determining the UAS and the object movements as well.
Despite full autonomous image semantic segmentation would clearly be of interest, its development seems to be quite challenging. Nevertheless, future investigations will be dedicated to these aspects, in order to increase the procedure automation level.
The simulator will be freely available for download from the website of the GeCo (Geomatics and Conservation) laboratory of the University of Florence (Italy).
For most end-users, the term ‘software’ is equivalent with executing a given application to obtain a desired result. Moreover, the highest importance is usually attributed to the software being free to use. Besides intuitive use, a key requirement for success and wider acceptance of a software application is easy access, which is often facilitated though open-source projects. While users naturally only care about stability and functionality of the software, software developers often see their task completed once the application reaches a certain degree of maturity and its source code is made available. However, in addition to ease of use and targeted software development, a third component in the life cycle of software design [Vogt 2019] is the software provision. The importance of adequate software dissemination entails a wide range of aspects, which are often undervalued but are crucial to best meet end-user expectations and to achieve the highest application acceptance.
In this manuscript, we outline a perspective on approaches to appropriately address issues of software provision aimed at promoting software in an efficient way. We illustrate the motivation and features of various aspects of software provision on the recently published software GWB [Vogt et al., 2022]  and its implementation on the FAO cloud computing platform SEPAL .
- Software provision
This section summarises reflections on various aspects when disseminating a software application.
Source code: The provision of the source code is often perceived as a final product delivery. However, most end-users cannot make any use of the source code because they do not understand the programming language, do not know how to compile it or how to properly link required dependencies. The large number of Linux distributions provides an additional challenge due to varying inter-dependencies of distribution-specific compiled libraries and packaging policies. Packaging - the conversion of source code into a functional executable binary - is a science on its own and is beyond the skills of a typical end-user.
Target platform: Maximum outreach is achieved through a software setup that will work on as many platforms and operating systems (OS) as possible.
Software packaging: The scope of packaging is to bundle the entire application into a single archive, including or linking OS-specific dependencies, pre- and post-installation instructions, and the integration into the OS via menu entries. Examples are rpm and deb-packages (Linux), dmg-packages (macOS), and exe/msi-packages (MS-Windows). Packaging allows for efficient system-wide software management: installation, upgrades, and removal of the application and provides application access to all OS users. However, it also requires administrator rights, which are not available on many secured or closed IT environments, such as in government agencies, where users may fully access a limited OS-space only, i.e., their $HOME directory. Yet, this situation can be addressed by setting up the software and all required components in a self-contained single directory, which is then compressed into a self-extracting installer. Any user can then download such a standalone installer, extract it and have full access to the application without administrator rights. A similar result can be achieved with a Docker container .
Documentation: Documentation is crucial for software adoption, including a user manual, product sheets with application examples, and guided instructions in workshop material. The manual should also be completed by a user community to help end-users answer the questions they will raise while using the software. Developers of the tools should be actively involved in tackling these questions [Srba et al., 2016].
The GuidosToolbox Workbench (GWB) [Vogt et al., 2022] provides various generic image analysis modules as command line scripts on 64-bit Linux systems. In this section we use GWB to exemplify how we addressed the software provision points mentioned before. * Source code: in addition to the distribution-independent compiled executables we provide the plain text source code for all modules in a dedicated subdirectory of the application. * Software packaging: all modules are launched via customised Bash scripts and setup in the IDL programming language . Because IDL provides its own set of highly efficient processing libraries, all scripts and required libraries can be stored in a single, distribution-independent application directory. Combined with customised packaging setup-files, this archive is then converted into distribution-specific packages for common Linux distributions. In addition, we provide a generic standalone installer using the makeself  archiving tool. The standalone installer can be used on any Linux distribution for either, system-wide installation, or installation in user space on restricted systems, i.e., under $HOME. All installer packages include two sample images and module-specific usage descriptions, aimed at generating sample output illustrating the features of each module. * Target platform: With its focus as a server-application, GWB is setup for the Linux OS, which can also be used on a regular desktop PC. The installation on cloud computing platforms, including an interface to upload/download personal data, greatly enlarges the outreach into the user community and allows usage of the software from any device having a Web browser and Internet access. A Jupyter [Kluyver, T., 2016] based application was developed within the SEPAL platform . This application uses widgets and interactive displays to help the end-user provide personal data to the software. This application is developed in Python using the sepal-ui  framework, embedding a fully independent set of requirements. As the application is run using the voila dashboarding tool [QuantStack, 2019], the end-user is never confronted with the CLI, vastly improving the scope of potential end-user to non-IT experts. * Documentation: the project homepage  provides a brief overview and installation instructions. Highly detailed usage instructions are available on SEPAL for the command-line use  and the interactive Jupyter dashboard .
Software provision is an often overlooked yet critical component in software design. It comprises various aspects, which when addressed appropriately, can make a great impact in the promotion, outreach and acceptance of a software application.
Free and open source software for geospatial analysis (FOSS4G) supports burgeoning possibilities for practicing open and computationally reproducible human-environment and geographical research (Singleton et al 2016).
Open and reproducible research practices may accelerate the pace of scientific discovery and enhance the scientific community's functions of knowledge verification, correction, and diffusion (Rey 2009, Kedron and Holler 2022).
Geospatial metadata provides the foundation for reproducibility and open science and accordingly, requires more support in open source geospatial software.
Following Wilson and others' (2021) five star guide for reproducibility, researchers can achieve four stars by conducting research with open data and software and documenting metadata according to the standards of the International Organization for Standardization (ISO) and OGC (Open Geospatial Consortium).
For Tullis and Kar (2021), metadata is the key to documenting the provenance of research data artifacts, preserving information about every detail of data creation and transformation.
Wilkinson and others' (2016) FAIR Guiding Principles for scientific data management enumerate functions for metadata in each of the principles for research: findable, accessible, interoperable, and reusable.
However, open source geospatial software platforms generally lack the tools necessary for mainstreaming geospatial metadata into the full research workflow in support of more efficient research work and enhanced reproducibility and open science.
This research on metadata is part of a larger human-environment and geographical sciences reproducibility and replicability (github.com/HEGSRR) project aimed at conducting formal reproduction and replication studies in the geographical sciences and integrating reproducibility into undergraduate and graduate level curricula in research methods.
Following the National Academies of Science, Engineering and Medicine (NASEM, 2019), a reproduction study aims find the same results using the same data and methodology as a published study.
A replication study aims to test the findings of a published study by collecting new data and following a similar methodology, which may intentionally modify one or more research parameters.
Together, reproduction and replication studies offer a deep understanding of the original research, test its credibility and generalizability, and enhance the self-corrective mechanisms of the scientific community.
Metadata is information about data, including essential contextual information about the data's spatial structure, attributes, creation, maintenance, access, licensing, and provenance.
A key component of reproducible research is an executable research compendium containing all of the data, code, and narrative required to compile a research publication from raw data (Nust and Pebesma 2021 and the Opening Reproducible Research Project).
Computational notebooks like Jupyter notebooks or R Markdown are commonly used to interweave narrative with code in executable compendia.
In order to maximize replicability and inferential power, the research compendium should begin with a pre-registered research plan prior to data collection, requiring researchers to fully specify metadata for all of the research data and analyzes that they intend to create (Nosek et al. 2018).
It is recommended to store compendia in version tracking systems like Git in order to preserve a full history of changes to the research project.
Finally, the compendium should be published parallel to academic publications so that other researchers can independently re-run, check and verify the analysis, or incorporate the research in future projects.
In order to maximize the findability and legibility of the research compendium for both humans and machines, the overall repository and each of its components must be meticulously documented with metadata according to international standards (Wilkinson et al. 2016, Wilson et al. 2021).
In this three-part research paper, we focus on metadata in research compendia and related research products through all phases of the research workflow.
First, we specify ideal requirements of geospatial metadata in support of reproducible research workflows and open science.
We consider metadata needs at each step of the research process, including proposal writing, pre-analysis registration, ethics review board approval, data collection and analysis, publishing, and reproducing published research.
The metadata software needs assessment is based on literature review of reproducibility and open science, and on teaching and practicing reproducibility with geographic methods.
Second, we review the Dublin Core and International Organization for Standardization (ISO) geospatial metadata standards and popular open source platforms for geospatial research and their support for the requirements of geospatial metadata articulated in the first part.
The scope of the review includes metadata functionality available through spatial analysis software platforms, including R, Python, QGIS, GRASS and SAGA; and it also includes metadata or cataloging tools, including GeoNetwork, GeoNode, the USGS Metadata Wizard, and mdeditor.
Finally, we articulate a vision for open source geospatial metadata software development in support of open and reproducible human-environment and geographical research.
In this vision, metadata software tools shall integrate with executable research compendium to assist researchers with their workflow from inception to publishing and archiving.
The vision builds off our HEGSRR project, in which we independently reproduce and replicate published studies with open source geospatial software, integrate reproduction and replication studies into project-based geographic information science courses, and develop curricula and infrastructure for reproducible research.
Each section of the paper is thus supplemented with experiences and examples drawn from the HEGSRR project.
Of particular relevance, we have already completed seven reproduction or replication studies with graduate and undergraduate students using open source geospatial software, encountering numerous barriers caused by inadequate use or documentation of metadata in research planning, execution, and archiving.
We have also developed a template Git repository compendium for reproducible research and prototyped its use in our studies and teaching, discovering software barriers to documenting metadata and opportunities to integrate metadata into more efficient and reproducible research practices.
We have selected FOSS4G for this research paper in hopes of reaching both the academic audience and developer audience at the conference. We hope to raise awareness in the academic audience of the critical importance of geospatial metadata in each stage of the research workflow. We hope to raise interest in the open source geospatial software community for collaboration on improved support for geospatial metadata in research workflows.
Motivation & Contribution
Mobility researchers using GPS first obtain raw coordinates and timestamps from GPS instead of the variables they're interested in. Conversion is needed to acquire, for example, the time spent out of home, the number of revisited places, or the total time spent on the go. All of these rely on the ability to precisely identify stops and trips and are therefore fundamental when it comes to mobility research.
The commonly adopted strategy involves a combination of a distance and a time threshold to identify significant places (Ash- brook and Starner, 2002, Ye et al., 2009). Here, GPS records are grouped together, if they lie within such a pre-defined radius and time. When we planned the technical basis for a mobility intervention study, we tested several existing systems based on this approach. We observed, on the one hand, significant segmentation of the identified stops, due to the relatively large amount of signal noise. On the other hand, we could only identify stops having a duration greater than a pre-defined time threshold, usually five minutes. Hence, the temporal resolution of this analysis was sub-optimal. Reduced this threshold, lead to an increased number of falsely identified stops (false positives) and segmentation. To solve this, we developed a modern stop and trip identification algorithm.
For a human annotator, this task is fairly easy: when dwelling on a spot, the GPS records scatter around the true position because of its imperfect signal. Records obtained from a trajectory through an environment are clearly distinguishable - although the imperfect signal diverges from the true position similarly. This observation inspired us to create a new algorithm around the idea of investigating the signal patterns, and therefore the geometric properties of the signal noise.
We describe the algorithm's mechanics in detail and discuss its design decisions. Further, we provide benchmark results against established and frequently cited libraries.
Fundamentally, the algorithm is based on a multitude of different, geometric analyses. Each analysis method is applied to a rolling window of subsequent GPS samples. For example, one metric evaluates the ratio between total path length and the bounding box of the set. Another is concerned with the mean angles between the point vectors. Subsequently, all metrics are combined to form a majority-based classification decision for each individual GPS sample. This way, the different methods can compensate for a wrong decision of a minority of the metrics.
If available, the acceleration of the device is also taken into account to exclude unambiguous periods of non-movement. Therefore, we created a simple metric that transforms a three-dimensional vector of x, y, and z acceleration into a motion score that expresses the amount of physical movement of the recording device.
The labels of individual GPS samples are then used to aggregate stop intervals. In the last step, the resulting stop intervals are filtered. Therefore, each interval is compared against the neighboring ones to decide if a) it should be kept as it is, b) if it should be merged with a close stop-interval to reduce segmentation, or c) if it should be discarded.
To test the accuracy of our analysis approach, we benchmarked the system against the built-in methods for stop and trip detection of Moving Pandas (Graser, 2019) and Scikit Mobility (Pap- palardo et al., 2019). These represent a large share of the most commonly used tools for mobility research.
To test the classification performance, we created a large dataset containing trajectories from over 126 days of everyday life and captured 692 stops.
This reference acts as ground truth for the comparison of different frameworks. We investigate sample-by-sample classification metrics (accuracy, precision, recall/sensitivity, specificity, and F1) and stop/trip interval specific metrics (stop-counts, several metrics to quantify the number of detected stops against the reference, such as % matched reference stops, absolute duration error, missed stop duration, absolute start deviation, absolute end deviation, and position deviation). To ensure a fair comparison of the algorithmic approaches, we did not take the acceleration data into account, as the reference systems do not support filtering stop and trip intervals using this kind of data.
Results & Discussion
Our Stop & Go Classifier outperforms other systems in most metrics: it identifies more stops correctly, the stops it misses are shorter in duration, and the start and end times of the identified stops are almost twice as precise as the closest competitor.
The core ideas of the system are a) it uses unfiltered, raw GPS data, b) it analyzes these regarding their geometric properties, and c) it uses multiple scoring mechanisms to create one solid classification.
The Stop & Go Classifier is free software under a BSD 3-Clause license. The repository includes a reference implementation of the algorithm and small usage examples: https://github.com/RGreinacher/Stop-Go-Classifier
Currently, various kinds of geospatial data are provided as open data/or map tile data. This implies that geospatial data have become easier to obtain and use than older data with traditional licenses and formats. By combining map tile data with Web Mapping clients, such as Open Layers and Leaflet, we can browse maps of any location without complicated procedures, i.e., downloading data, transforming coordinate system, extracting area of interest, and installing software. These web mapping technologies have been developed mainly in the field of human interpretation of map images.
As described above, there are many applications for client-side data visualization using WebGL. However, an implementation of data analysis using WebGL, especially the map algebra function, has not been progressively developed. This paper aims to develop map algebra functions for Data PNG tiles with WebGL in a client-side Web mapping system.
In this study, we attempted to develop map algebra functions for vineyard suitability assessment in Nagano Prefecture, Japan. In recent years, “Japan Wine,” made exclusively from grapes grown in Japan, has been gaining international recognition, and new wineries in Japan are also increasing. Thus, there is an urgent need to provide information to support the selection of appropriate vineyard sites and grape varieties. There have been attempts to assess the suitability of agricultural fields for crop production applying GIS. Despite these efforts, suitable site evaluation of vineyards has not been fully disseminated due to the lack of the following components: (1) sufficient quality, quantity, and accurate information necessary to determine the suitable site; (2) appropriate criteria for evaluating suitable sites based on the information, and (3) methods for providing evaluation results to consumers, such as new farmers. Therefore, we attempt to develop the client-side Web mapping system, using only a Web browser without any special skills and specific software, which enables new farmers to evaluate suitable sites for vineyards.
A variety of environmental information is required for assessing vineyard suitability. In this report, we converted spatial information about geology, soils, topography, and meteorology, which is available as open data, to Data PNG tiles with FOSS4G tools, such as QGIS, TileMill, and MBUtil for suitability assessment. The vineyard suitability assessment system consists of the following map algebra functions:
1) Generate assessment values from a Data PNG tile layer by performing quadrature calculations, specifying the order of operations using parentheses, and classification based on logical operation formulae.
2) Comprehensive assessment function that performs a quadratic calculation, specifies the order of operations using parentheses, and classification based on logical operation formulae between the layers generated by the above procedure.
3) Vineyard suitability visualize function based on the comprehensive assessment
The Web Mapping interface was developed with Leaflet, which has a graphical interface to input map algebra formulae and a function to display and export the image of vineyard suitability based on the comprehensive assessment result described above. A prototype of the vineyard suitability assessment system is available in the following URL: https://wata909.github.io/web-map-algebra/index_e.html.
In this system, data used for assessment are provided as Data PNG tiles, and a map algebra function is performed by WebGL on a client-side. In other words, unlike many other Web Mapping systems, our system does not require server-side systems and/or middleware and can be operated using only a web browser. This means that various entities can be operated on the same system at a low cost or on a free Web service, such as GitHub pages. Additionally, the functions implemented in this system can be applied to various evaluations using the map algebra functions.
However, our system contains only seven items for assessing suitable locations, which is not sufficient. The arithmetic functions of the system are limited to four arithmetic and logical operations, and it is not capable of implementing the complex model calculations required for highly realistic assessment. We are currently constructing a suitability assessment model using machine learning, information on the distribution of vineyards obtained from field surveys, and various environmental factors derived from field monitoring and published open data. In the future, we will use these data to improve the system and make it more practical.
The survey took place in part of the so-called Roman Villa of Caposele, also known as Villa Rubino (Giuliani and Guaitoli 1972; Cassieri 2015). The Villa, built by the Dukes of Marzano and subsequently passed into the hands of Charles of Ligny, Prince of Caposele, was purchased by Ferdinand II of Bourbon in 1845, with the aim of making it a luxurious summer residence. The building overlooks the inlet of Caposele, where there must have been a small harbour, and is squeezed between the Via Appia and the sea. To the west of the small port are the remains of an imposing structure with a central courtyard, datable to the 1st century B.C., which scholarly tradition has identified as Cicero's Academy or School, although it is probably a horreum, testifying to the utilitarian vocation of this area of the villa. In later phases, while retaining its intended use, the horreum would be incorporated into a residential building complex together with other structures further to the west that, too, may have served as warehouses in the earlier phase. To the east of the marina is the residential area, the area in which the survey operations were concentrated. Here, on a front about 140 metres long, there are a series of rooms with barrel vaults that were probably part of the basis villae of the building. In two of these rooms are the so-called minor and major nymphaea. The first consists of an almost quadrangular room with a roof supported by four Doric brick columns; on the back wall, in a large niche, spring water gushes out. The wall decorations include stucco, shells and incrustations of glass paste and small stones. The main nymphaeum, on the other hand, is divided into three naves and covered with a rounded coffered vault supported by Doric columns. The large niche at the bottom of the nymphaeum contains a pool of spring water; the floor is in white mosaic with polychrome dots. These nymphaeums constitute the focus of the intervention.
In front of this front there was a very large fishpond, which ran into the sea for about one hundred metres in length, with a width of over 200.
Because of its architectural features and good state of preservation, the central body of the monument has always been a great attraction for visitors and scholars, many of whom have left descriptions and drawings in their diaries.
The two nymphaeums have to be surveyed both for conservation and study purposes and in order to allow a virtual visit, which is particularly important since they are located inside a private property. As already described, the structure is complex, with a succession of rooms and environments in an archaeological complex extending approximately 480 metres in an east-west direction and approximately 50 metres in a south-north direction. The survey of such an extension and such an articulation with consolidated techniques such as terrestrial laser scanning would probably have required days of work, and for this purpose we wanted to test the possible use of the most modern SLAM techniques, in particular using a GEOSLAM Zeb Horizon, totally transportable by an operator and with a range of up to 100 metres (https://geoslam.com/solutions/zeb-horizon/).
In order to compare the times, modes, precision and accuracy of the point cloud thus obtained, we took advantage of the possibilities provided by the open software "Cloud Compare 2.11.3 64 bit version", which allows us to compare point clouds of different origins. Cloud Compare allows comparisons to be made with various methods of calculating distance and to estimate precision and accuracy separately, allowing one cloud to be fitted to the other or to be compared while remaining within their absolute coordinates.
In the present experimentation it was therefore decided to survey both nymphaea with the "GEOSLAM", also surveying all the internal connecting rooms and corridors between these two environments. The whole survey was carried out in a few tens of minutes and therefore the survey continued over most of the exterior of the entire structure.
The survey of the entire complex was not carried out because the main interest of this project was to test the SLAM technology and validate its precision and accuracy in comparison with more consolidated techniques.
For comparison, only the major nymphaeum was surveyed with a more consolidated laser
Faro" terrestrial scanning laser.
In order to verify the validity of the Slam also on the external part, a survey was carried out using a DJI Matrix drone with laser scanning. Finally, the same survey was also carried out with an optical camera on the same Matrix drone and with the most widely used drone for photogrammetry, i.e. the "Phantom 4 pro", also by DJI.
All the surveys were framed with respect to the same network of ground control points, in order to refer them to the same framing system and be able to assess their precision and accuracy.
It should be noted that Slam was only able to station a few of the GCPs while, as can be easily guessed, the drones acquired practically all of them.
The comparison showed very limited deviations whose statistical validation is in progress, demonstrating that the SLAM technique can advantageously be used in such vast archaeological complexes where the completeness of the survey is more important than millimetric accuracy.
In a new initiative to deliver higher-quality data and support improved geospatial analysis, the U.S. Geological Survey (USGS) is upgrading the elevation and hydrography datasets into the 3D National Topography Model (3DNTM), which will include fully integrated hydrography and elevation. The USGS 3D Elevation Program (3DEP) recently completed acquisition of interferometric synthetic aperture radar (IfSAR) elevation data at 5-meter spatial resolution for Alaska (USGS, 2022). Other parts of the United States are being mapped at higher resolution with lidar-derived elevation data.
Under the 3DNTM, new hydrography data are acquired through methods that derive or extract the features directly from best available 3DEP elevation data to ensure proper integration of the hydrography and elevation layers. By applying specifications for deriving 1:24,000 or larger scale hydrography from high resolution elevation data (Archuleta and Terziott, 2020; Terziotti and Archuleta, 2020), a tenfold increase in the number of features in the National Hydrography Dataset (NHD) is expected. Consequently, highly automated machine learning methods to extract and validate the hydrography data collection are being investigated.
Xu et al. (2021) demonstrated that the U-net fully convolutional neural network (Ronneberger, Fischer, and Brox, 2015) is capable of extracting hydrography from lidar elevation data with 80 to 90 percent accuracy. Stanislawski et al. (2021) applied a similar U-net model using several IfSAR and IfSAR-derived input layers to predict hydrography for a 50-watershed study area in northcentral Alaska, where 68 percent average F1-score accuracies were achieved on test watersheds. Further work to refine U-net predictions of hydrography using IfSAR for the same 50-watershed area in Alaska achieved average F1-scores for test watershed of better than 80 percent (Stanislawski et al., 2022). Research presented in this paper builds upon this earlier work by testing transfer learning methods and scaling-up U-net predictions of hydrography from IfSAR for other areas of Alaska using workflows in high-performance computing environment.
A workflow was developed to automate downloads and processing of IfSAR-derived tiles of digital elevation model (DEM), digital terrain model (DTM), and orthorectified intensity (ORI) data for user-selected watersheds from the 3DEP database. The workflow mosaics common tiles and derives several raster data layers from the DEM that are related to surface hydrology, such as topographic position index and shallow water channel depth. Overall, seventeen data layers are generated and coordinated with identical raster projection systems. The layers were used in U-net modelling for predicting hydrography for the 50-watershed Kobuk River study area (Stanislawski et al., 2022). In this study a transfer learning process begins with the Kobuk River U-net model and subsequently includes additional training data from outside the Kobuk area. Hydrography predictions are then generated from the transfer learning model and assessed. Several levels of refinements to training data are tested and the accuracy of predictions are assessed. Reference data consist of vector hydrography features derived by USGS contractors.
The data processing workflows are implemented with Python, linux shell scripts, and opensource software libraries such as the Geospatial Data Abstraction Library (GDAL). Neural network modelling is implemented through TensorFlow, and data processing is completed on a 12-node linux cluster and through the GPU nodes of the USGS Tallgrass computing facilities (https://hpcportal.cr.usgs.gov/hpc-user-docs/Tallgrass/Overview.html).
Mapping hydrography for the state of Alaska is a daunting task, given its vast area and terrain that is difficult to navigate. Big challenges with large high-quality datasets are well suited to take advantage of recent advancements in neural networks (Usery et al., 2021). This research demonstrates the tremendous potential to improve and speed up mapping of surface water features in Alaska, and elsewhere in the world having challenging terrain and limited resources.
Reported accuracy scores measure how well a machine can reproduce hydrography generated with meticulous editing by numerous subject matter experts. It is not a score of how well the surface water features are mapped by the model. The human factor in contemporary broad scale mapping efforts cannot be ignored and warrants consideration as a source of uncertainty in the related accuracy metrics. How well the maps fit what is on the ground can only be definitively confirmed by being on the ground at any given point in time, as hydrologic conditions are constantly in flux. Thus, the work here could be used as an aid to human cartographers in their efforts to interpret what is important to the map user.
This work could also benefit change detection efforts. As new and better elevation data are collected, automated strategies such as the model presented here could be used to identify regions with significant changes in surface water distribution. This type of automation would be valuable to maintain an accurate national map over time and help address the numerous challenges that society faces related to hydrology.
OpenStreetMap (OSM) can supply useful information to improve land use/land cover (LULC) mapping (Arsanjani, 2013; Schultz, 2017; Zhou, 2019). A dictionary is needed to convert each OSM tag into an LULC class. However, such a dictionary was mostly created subjectively or with only one pair of OSM and reference datasets. As a result, the existing dictionaries may not be applicable to other study areas. This study designed four measures: sample count, average area percentage, sample ratio and average maximum percentage; and used multiple pair of OSM and reference datasets to create a dictionary. 50 pan-European metropolitans were involved for testing and 1409 different OSM tags were found. We further found that: 1) Only a small proportion of OSM tags play a decisive role for LULC mapping. 2) An OSM tag may correspond to multiple different LULC classes, but the issue that which and how different LULC classes correspond to each OSM tag can be determined. Moreover, not only the proposed dictionary is useful for various applications, e.g., producing LULC maps, obtaining training and/or validation samples, assessing the quality of an OSM dataset, but also the approach to creating this dictionary can be applicable to different study areas and/or LULC datasets.
OSM datasets of the 50 metropolitans were acquired for free from http://download.geofabrik.de/index.html in June 2020. Corresponding reference datasets (called urban atlas or UA) were available from https://land.copernicus.eu/local/urban-atlas/urban-atlas-2012/# in June 2020 freely.
The tenet of our approach is to use multiple pairs of OSM and reference datasets for creating an OSM-LULC dictionary. In each pair of datasets, an OSM tag may correspond to different LULC classes, it is therefore necessary to determine which is the most appropriate LULC class for each OSM tag. we assumed that most OSM tags have been tagged by volunteers correctly (Zhou et al. 2019). Following this assumption, the way to determine the most appropriate LULC class for each OSM tag includes two steps. Firstly, all objects of an OSM tag are intersected with those of different LULC classes, respectively. After that, the LULC class with the maximum intersecting area is viewed as the most appropriate one for this OSM tag. Four attributes and four measures are designed to describe an OSM- LULC dictionary. They are: Tag ID, Tag Name, Class ID and Class Name in terms of attributes; and Sample Count, Average Area Percentage, Sample Ratio and Average Maximum Percentage in terms of measures. They are introduced as follows: 1. Tag ID denotes the ID of an OSM tag, 2. Tag Name denotes the name of an OSM tag. 3. Class ID denotes the ID of an LULC class. 4. Class Name denotes the ID of an LULC class.5. Sample Count (SC) denotes how frequent an OSM tag is appeared in different study areas or datasets. 6. Average Area Percentage (AAP) denotes the average of the area percentages of an OSM tag in multiple different OSM datasets. 7. Sample Ratio (SR) denotes the percentage of study areas or datasets that an OSM tag corresponds to an LULC class. 8. Average Maximum Percentage (AMP) denotes the average of all the maximum percentage in different study areas or datasets.
Conclusion and application
This study proposed an approach to creating an OSM-LULC dictionary. The tenet of this approach was to involve multiple pairs of OSM and reference datasets for the analysis. First of all, each pair of OSM and reference datasets were intersected and the most appropriate LULC class for each OSM tag was determined. Then, the four measures, i.e., sample count (SC), average area percentage (AAP), sample ratio (SR) and average maximum percentage (AMP), were designed and calculated based on multiple pairs of OSM and reference datasets. More precisely, a total of 50 pairs of OSM and reference datasets in pan-European metropolitans were chosen as study areas for creating an OSM-LULC dictionary. Finally, a number of 1409 different OSM tags were found and they were reclassified into five and 14 different LULC classes, respectively. Moreover, this dictionary was also analyzed with the four proposed measures. Results showed that:
Firstly, most OSM tags (＞ 1,000) were only found in less than five study areas (SC ＜ 5). Moreover, only 37 of the 1409 OSM tags had a percentage of average area (AAP) larger than 0.1%. This indicates that a small proportion of OSM tags can play a decisive role.
Secondly, an OSM tag may correspond to multiple different LULC classes within a pair of OSM and reference datasets; The most appropriate LULC class for each OSM tag may also vary among different pairs of datasets. Thus Both the SR and AMP may also vary in different pairs of OSM tag and LULC class.
With the proposed dictionary, it is possible to understand the differences of different OSM tags and different pairs of OSM tag and LULC class. This is essential not only for producing LULC maps, but also for picking up training and/or validation data from an OSM dataset and also for detecting incorrect tags in an OSM dataset. Therefore, we concluded that it has benefits for creating an OSM-LULC dictionary based on multiple pairs of OSM and reference datasets.
Thankfully to the European Commission initiatives such as INSPIRE (2007) and other governmental policies, spatial data are available publicly on different national, regional and municipality geoportals for further use. When it comes to the cultural heritage and Italian context, based on the decree of the Ministry of Culture (MiBACT, 2008), different activities concerning heritage has been assigned to the ICCD (i.e., Central Institute for Catalogue and Documentation) such as research and technical-scientific collection of the documentation and coordination of cataloguing of cultural heritage and its digitalization. These regulations allowed the public entities to share substantial information about geographical and spatial data with a wider audience. Specifically in the region of Lombardy, data about cultural heritage are catalogued in SIRBeC (i.e., Regional information System for Cultural Heritage) that has been promoted since 1992 and continues collecting, managing, and publishing a vast amount of information. Vector shapefiles are freely available for download on the Geoportale Lombardia. The scope of the research was collecting information about cultural heritage in Lombardy that is freely accessible online. Data downloaded are point and polygon features files of the position of the cultural heritage. Furtherly, the methodology developed deals with the use of QGIS, as the open and free software together with the Python console integrated into the software and finally using the online software of the integrated development environment (IDE) named Replit that is free, open, collaborative and in-browser Python coding application.
The methodology is based exclusively on free and open sources, starting from the collection of data to their processing. Each vector file is enriched with the metadata in the attribute table but the methodology is providing a combination of software to obtain other data (e.g., coordination, area, etc.) and statistical analysis (e.g., ratio, percentage, position, distribution, etc.), which are the initial part of each elaborated cultural heritage project. Additionally, the methodology is discussing different approaches to reach the desired result and compares their differences. Firstly, the Python console in QGIS was examined, and metadata were extracted from the vector file to the .csv file to be used in Replit. The online codding application gave a higher degree of flexibility while coding, and it was possible to implement data extracted in a .csv file into a coding panel, using them to produce different statistical analyses. Furtherly, the methodology discusses the use of the plugin of QGIS called DataPlotly and data differences, from the representation to the utility level.
Results through the Python Console in QGIS allowed the extraction of necessary data for further analysis, deleting the ones which are not needed. The good side of this approach is that metadata of the shapefile stay untacked, and the Python is simply extracting selected data in a new external file. There have been selected four categories of interest: Name, Category, Typology and Municipality of the cultural heritage. The area of interest was a northern part of Milan, in the province of Monza e Brianza which has a dense and diverse category of cultural heritage. Using the python code, these four categories are temporarily printed and saved in the console panel. Since there is no information about coordinates inside the metadata, there are two approaches that are tested to obtain them. The first one used was the QGIS integrated option "Add geometry attributes", which created the new shapefile enriched with the information about longitudinal and latitudinal coordinates. The second approach was extracting the coordinates through the Python console with the f.geometry() function. Information about the four categories selected and coordinates are printed temporarily in the console, and the user can control the order of the columns and delimited type, following the saving and extracting the .txt file.
The second part of the analysis also discusses two methods that were tested for the creation of statistical analysis of extracted data and their representation, firstly in the QGIS plugin DataPlotly and then using Replit. Presenting statistical analysis in the form of different charts is available directly through the plugin. Nevertheless, when it comes to the great amount of data the plugin resulted not be very efficient for the representation nor easy to manage the view. Another constrain is that there is no option for exporting graphs in a .pdf file. On the other side, creating the charts through the Python packages such as matplotlib or pandas shows a better degree of control over a graph. The advantage is that there is a possibility of exporting it in many different files, such as a .pdf or .svg file. Additionally, through the Python in-browser application, there is a higher degree of control and change of the visual representation of charts.
In conclusion, the process of extracting the coordinates from the previously georeferenced shapefiles can be useful when it comes to the georeferentiation of other collected material, such as dense point clouds created by photogrammetric techniques and other photographic material collected in-situ. In the past years, a lot of students, researchers, and professionals were not able to continue their work because of the inaccessibility to the site and unavailability to perform the field survey which is necessary when it comes to the investigation of cultural heritage. The process of using and combining open and free software, including both those which are used off and online, can provide to a certain degree some information that is not visible in attributes so that the study can be continued, and research can be conducted also in a remote. The methodology, processes and tools used are simple, yet they are creating clear guidelines of the potentiality and importance of freely shared data and stresses again the power of geographic information tools in urban and architectural analyses.
Traffic accidents are a significant problem facing the world, as they result in many deaths and injuries every year. Generally, the probability of traffic accidents occurring at any point is not random. Factors such as the condition of the road, where the accidents occurred, and the general structure of the land play an essential role in the accidents that will occur at one point. For this reason, traffic accidents tend to occur intensively in areas where these factors are different from usual.
It is critical to identify such areas and take the necessary measures to ensure road safety and reduce traffic accidents. Identifying the different geographic locations where traffic accidents occur can help prevent more traffic accidents, personal injuries, and fatal accidents and understand the different accident occurrence conditions. When the literature is considered, it is seen that many studies in this field are handled with different methods. Analyzing the locations where traffic accidents occur by considering the hot spots with spatial clustering methods plays a very active role in examining the tendency of traffic accidents to occur. In this study, it is thought to deal with detecting traffic accident hot spots by using the GIS-based Nearest Neighbor Hierarchical Clustering Method (NNH) and Density-based clustering Method (DBSCAN).
Nearest Neighbor Hierarchical Clustering Method (NNH) is a hot spot spatial clustering method that detects accident hot spots. This method considers two types of criteria for spatial mapping clustering of spatial point data: the threshold distance (d), which is the Euclidean distance between each pair of data points, and the minimum number of points that must be present in a cluster (nmin) (Kundakci E, 2014; Kundakci and Tuydes-Yaman, 2014; Levine, 1996; Levine et al., 2004; Ture Kibar and Tuydes-Yaman, 2020). At the point of realizing this method, the crime stat program, which was developed especially for hot spot clustering analysis of crimes, is widely used. CrimeStat is a crime mapping software program developed by Ned Levine (Levine, 1996).
Density-based clustering, on the other hand, is also known as DBSCAN, is a method for finding specific predefined events and hotspots. The algorithm, moreover, is open source and recommended for noisy data in large spatial databases (Ester et al., 1996). This method identifies a cluster as the most densely connected set of points possible. There are two criteria addressed in this method: Epsilon and minimum scores. The maximal radius of the neighbourhood is epsilon, and the minimal number of points in the epsilon-neighbourhood to describe a cluster is minimum points. This clustering algorithm separates the point data into three different forms (Schubert et al., 2017).
In the study, the Mersin province of Turkey was chosen as the pilot region for the analyses using the mentioned methods. Mersin is a port city located in the Mediterranean Region of Turkey, located between 36-37° north latitude and 33-35° east longitude. As of 2021, it has a population of 1.891.145 (URL-1, 2022). It is the most important domestic tourism center of Turkey and is on the way to becoming Turkey's new tourism region with the appointments made in tourism in recent years and new hotels built on the beach.
This study predicted determining the risky areas where speed-related traffic accidents will occur in Mersin, which is an important point for the country, and to make predictions by making evaluations depending on the road geometry at the determined points. In addition, it will be examined whether the measures to be taken based on the analysis at the determined points are made comparatively with two different methods and whether these evaluations create differences by considering both based on a large region and the basis of a more local region.
The study was planned in four phases. First of all, spatial and non-spatial data of the selected pilot region will be provided. For this stage, traffic accidents data between 2013-2020 will be obtained from the general directorate of safety and the general command of the gendarmerie. The obtained data will be organized and then transferred to the geographic database for GIS-based analyses in the second stage. Since speed-related traffic accident hot spot analysis will be performed in the study, the database will be suitable to include speed-related accidents. The NNH and the DBSCAN method will be performed in the third stage, and the results will be discussed. At this stage, the Crime Stat III program will be used for the NNH method, and the open-source GIS program QGIS will be used for the DBSCAN method. All results will be analyzed, visualized, and evaluated through the QGIS program. In the last stage of the study, the results obtained will be examined according to the probability of accidents. Finally, the obtained risky areas according to the analysis results will be evaluated according to the geometry of the road. In short, it will be examined within the framework of accident-road geometry whether the structure of the road and the high-risk areas of the accidents overlap.
The fact that the points where speed-related accidents will tend to cluster will be determined, with the study to be carried out, will address a significant gap in this field. Since the effectiveness of the methods will be compared with a different analysis, a study will be constituted a base for studies in a similar field. In addition, since the reasons such as whether these methods produce effective results in large regions and more local regions will be examined, it is thought that important suggestions will be made and contributions to the literature. Finally, since the results obtained in the study will be evaluated depending on the road geometry, the traffic accident-road geometry relationship will be discussed. Thus, a base for similar studies will be provided.
Because of technological advancements, public participation in scientific projects, known as citizen science, has grown significantly in recent years (Schade and Tsinaraki 2016; Land-Zandstra et al. 2016). Contributors to citizen science projects are very diverse, coming from a variety of expertise, age groups, cultures, and so on, and thus the data contributed by them should be validated before being used in any scientific analysis. Experts typically validate data in citizen science, but this is a time-consuming process. One disadvantage of this is that volunteers will not receive feedback on their contributions and may become demotivated to continue contributing in the future. Therefore, a method for (semi)-automating validation of citizen science data is critical. One way that researchers are now focusing on is the use of machine learning (ML) algorithms to validate citizen science data.
We developed a citizen science project with the goal of collecting and automatically validating biodiversity observations while also providing participants with real-time feedback. We implemented the application with the Django framework and a PostgreSQL/PostGIS database for data preservation. In general, the focus of biodiversity citizen science applications is on automatically identifying or validating species images, with less emphasis on automatically validating the location of observations. Our application's focus, aside from image and date validation (Lotfian et al. July 15-20, 2019), is on automatically validating the location of biodiversity observations based on the environmental variables surrounding the observation point. In this project, we generated species distribution models using various machine learning algorithms (Random Forest, Balanced Random Forest, Deep Neural Network, and Naive Bayesian) and used the models to validate the location of a newly added observation. After comparing the performance of the various algorithms, we chose the one with the best performance to use in our real-time location validation application.
We developed an API that validates new observations using the trained models of the chosen algorithm. The Flask framework was used to create the API. The API uses the location and species name as parameters to predict the likelihood of observing a species (for the time being, a bird species) in a given neighborhood. Moreover, the model prediction, as well as information on species habitat characteristics are then communicated to participants in the form of real-time feedback. The API has three endpoints: a POST request that takes the species name and location of observation and returns the model prediction for the probability of observing the species in a 1km neighborhood around the location of observation; a GET request that takes the location of observations and returns the top five species likely to be observed in a 1km neighborhood around the location of observation; and a GET request that returns the species common names in English.
A user experiment was carried out to investigate the impact of automatic feedback on simplifying the validation task and improving data quality, as well as the impact of real-time feedback on sustaining participation. Furthermore, a questionnaire was distributed to volunteers, who were asked about their feedback on the application interface as well as the impact of real-time feedback on their motivation to continue contributing to the application.
The results were divided into two parts: first, the performance of the machine learning algorithms and their comparison, and second, the results of testing the application through the user experiment.
We used the AUC metric to compare the performance of the machine learning algorithms, and the results showed that while DNN had a higher median AUC (0.86) than the other three algorithms, DNN performance was very poor for some species (below 0.6). Balanced Random Forest (AUC median 0.82) performed relatively better for all species in comparison to the other three algorithms. Furthermore, for some species where the other three algorithms performed poorly (AUC less than 70%), Balanced-RF outperforms the others.
The user experiment results provided us with preliminary findings that support the combination of citizen science and machine learning. According to the findings of the user experiment, participants with a higher number of contributions found real-time feedback to be more useful in learning about biodiversity and stated that it increased their motivation to contribute to the project. Besides that, as a result of automatic data validation, only 10% of observations were flagged for expert verification, resulting in a faster validation process and improved data quality by combining human and machine power.
Why it should be considered:
Data validation and long-term participation have always been two of the most difficult challenges in citizen science and VGI (volunteer geographic information) projects. Various studies have been conducted on biodiversity data validation, focusing primarily on observation images with automatic species identification; however, not enough attention has been paid to observation location validation, particularly automatic location validation taking into account species habitat characteristics. Furthermore, to the best of our knowledge, the combination of machine learning and citizen science for sustaining participation by providing real-time user-centered and machine generated feedback to participants has received, till now, little attention and therefore our work is new, original and completely coherent with the vision of community citizen science, where scientists and citizen scientists are supposed to learn from each other.
Land-Zandstra, Anne M., Jeroen L. A. Devilee, Frans Snik, Franka Buurmeijer, and Jos M. van den Broek. 2016. “Citizen Science on a Smartphone: Participants’ Motivations and Learning.” Public Understanding of Science 25 (1): 45–60.
Lotfian, Maryam, Jens Ingensand, Olivier Ertz, Simon Oulevay, and Thibaud Chassin. July 15-20, 2019. “Auto-Filtering Validation in Citizen Science Biodiversity Monitoring: A Case Study.” In Proceedings of the 29th ICA Conference. Vol. 2. https://doi.org/10.5194/ica-proc-2-78-2019.
Schade S, Tsinaraki C.; Survey report: data management in Citizen Science projects; EUR 27920 EN; Luxembourg (Luxembourg): Publications Office of the European Union; 2016; doi:10.2788/539115
The “Destination Earth” initiative of the European Union encompasses the creation of Digital Twin Earths (DTEs), high-precision digital models of the Earth integrating various aspects of the Earth’s system to monitor and simulate natural phenomena and related human activities, being able to explore the past, understand the present, and build predictive models of the future. There are multiple elements that a Digital Twin Earth needs, such as strong computation capabilities, connectivity, cloud computing, Artificial Intelligence (AI), models that are able to describe physical phenomena, scientific collaboration, high volumes of good quality data (big data), and interoperability.
A full-scope Digital Twin Earth is a huge task that may require years to be built, and Destination Earth uses an incremental approach, where multiple smaller parts are put together to create a single, complete model by having smaller Digital Twins with the so-called digital twin precursors. This work presents an initial approach to address the big data, interoperability, cloud computing, and scientific collaboration elements of the DTE, by developing a modular web platform for integrating georeferenced open-source data using the mediator-wrapper architecture to retrieve and query data from online sources. The scope of the project is to create this platform for the Italian Coast, with the goal of being able to understand the interaction between the land and the sea, the human impact, and other factors that may affect the coasts employing data analysis.
Since ancient times, coasts have played a fundamental part in human civilization, being a critical element for development, economy, transportation, and tourism. In addition, coasts host an important portion of global biodiversity and richness, which is endangered by global warming and pollution. Thus creating a digital twin of the coast is an important task, in order to understand physical phenomena happening on the land and on the sea, as well as the interaction between those two elements, and the role of human activity on it. Although this work is focused on the Italian Coast, its modularity allows the pilot to be extensible and reproducible for any coast in the world.
As the idea is to address big data of good quality and interoperability, by quality data we mean authoritative, reliable, and validated data, and interoperability refers to data that can be easily used and integrated on any platform. Good quality data is found all over the internet, but the biggest and most reliable homogeneous open data source for the European continent is Copernicus. Copernicus provides six services that focus on Land, Ocean, Atmosphere, Climate Change, Security, and Disaster Management. Two services are of great importance for studying the physical phenomena of coasts: the Copernicus Land Monitoring Service (CLMS: https://land.copernicus.eu/), and the Copernicus Marine Environment Monitoring Service (CMEMS: https://marine.copernicus.eu/). The WorldPop population counts dataset (https://www.worldpop.org/), which is also open data made available by The University of Southampton, is used for understanding human impact. The CMEMS provides data on physical and biogeochemical variables for the sea while CLMS provides data on land cover and land use. Data ranges as far as 1987 to the present, its spatial resolution varies from 0.042° (approx. 3.5km at the latitude of Italy) for biogeochemical variables to 10 meters for land cover and is offered as monthly, daily, and hourly averages. Worldpop population counts are available yearly from 2000 to 2020 and have a spatial resolution of 3 arcseconds, which correspond to approximately 70 meters at the latitude of Italy.
Interoperability is achieved by standards. All data that is georeferenced and that is available online should follow certain guidelines and standards, which are managed by the Open Geospatial Consortium (OGC) and ISO (International Organization for Standardization). But mere standards do not completely solve the problem of interoperability because the way in which each data source presents its data is different, meaning that to achieve full integration an additional step is necessary. In the developed platform, this problem is addressed using a mediator-wrapper architecture, where a mediator receives generic requests and calls the specific wrapper, which is in charge of communicating with the specific data source and retrieving the data, to pass it again to the mediator which translates it back to generate a generic response. In this way, additional data sources can be integrated by building new wrappers. Data visualization is managed by the open-source web mapping library OpenLayers, which can correctly display any type of georeferenced data that follows OGC standards.
Other platforms exist that use online data sources to display data and to build knowledge around it. E.g., CMEMS has its own platform (https://myocean.marine.copernicus.eu/data) for visualizing all its datasets and allows users to build plots and to extract subsets of the data at different times and elevations; CLMS also allows users to see the datasets and retrieve parts of them within their website (Corine Land Cover example: https://land.copernicus.eu/pan-european/corine-land-cover/clc2018); other more complex platforms consume multiple data sources and build AI models around them such as the ARIES (Artificial Intelligence for Environment & Sustainability) platform (https://seea.un.org/content/aries-for-seea) that is focused on ecosystem accounting. The main difference between those platforms and the digital twin of the Italian coast in development is the focus on a single type of location, which makes models more specific and available data more accurate and localized. It is also possible to perform basic statistical analysis and to observe relations between layers, being able to visualize results as plots, tables, and histograms, as well as being able to download the produced data. Another novelty is the addition of demographic data to add the human factor to the analysis.
As this is a work in progress (available online on https://dte-italycoast.herokuapp.com/), more features are planned, such as capabilities to share projects and analysis, adding more data sources, AI models, and more sophisticated analysis than the current basic statistical analysis.
OpenStreetMap (OSM) has evolved to one of the most used geographic databases. It is a major knowledge source for many geographic topics addressed by researchers, professionals and the general public. To satisfy these diverse needs and capabilities, the linked communities surrounded the project with an ever growing ecosystem of analyses tools (e.g. OSM Contributors, 2022). The most prominent analysis topic is data quality (Senaratne et al. 2015) where e.g. intrinsic indicators are used to estimate completeness (Brückner et al. 2021). Furthermore the community is also interested in insights such as leader-boards or activity reports (e.g. Neis, 2022). In recent years analyses have also more and more shifted towards doing large scale analyses (e.g. Herfort et al. 2021).
This diversity of tools can be a challenge for data users who will find themselves in a universe of highly specialised or complex tools using different programming languages, platforms, interfaces, output formats etc. While there have been efforts to provide users with higher level data insight and analyses platforms, these still mostly concentrate on or are limited to certain topics or regions. To our knowledge no tool exists to analyse and combine topic independent aspects of the data at the highest possible resolution: single OSM elements.
The presented software (available at https://gitlab.gistools.geog.uni-heidelberg.de/giscience/ideal-vgi/osm-element-vectorisation) sets out to bridge this gap by integrating multiple aspects of the OSM ecosystem into one workflow that allows the quantitative assessment of selected OSM elements or all elements in a defined region. This enables new insights in a formalised and easy to use manner. The result is a vectorisation of single OSM elements (sometimes also called embedding or feature construction). By producing a machine readable result, the tool can be used for manual data investigations as well as for the ever growing field of machine learning where it can be linked to a range of labels.
The tool is centred around a python package providing a command line interface suitable also for novice users. It draws on other sources where necessary such as POST-requests and Java. Further data processing is done using the R scripting language while all data is stored in a PostGIS enhanced PostgreSQL database and can be exported automatically to .csv-files. The AGPL v3 license as well as the code structure and documentation enable others to also use it as a framework to implement their own analyses logic in combination with the current procedure. A default setup using Docker is provided for fast installation including a minimal example. The tool is fully functional and in use in our current research. Yet, it is under active development towards a web interface and functionality extensions. While the development was made with land-use and land-cover (LULC) information in mind, the tool can be seamlessly applied to any polygonal OSM data such as buildings and also supports linear and point data. The tool is resilient towards missing data and can recover from many common issues like failed connections. The backend remains in a sane state throughout the workflow and error messages enable the user to adapt to any failures and simply rerun the tool that will automatically pick up from the last savepoint. Benchmarks have shown that the tool is capable of processing around 1k elements per hour making it a suitable tool for larger analyses of custom regions or element sets.Out of the endless number of possible data aspects, a set of 32 are currently available for the user to choose. These cover aspects concerning the element itself (e.g. object area, geometric complexity and object age) but also the surrounding data (e.g. the mapping saturation and community activeness) and the editors (e.g. their experience, localness or editing software used).
To prove its potential, the tool is applied to a set of 1k randomly selected OSM LULC elements. We picked OSM LULC as an example as it has been shown to be valuable for applications such as earth surface monitoring. The results provide a status report on the already available data to the OSM community. It further enables a more informed planning of future activities like organised mapping or data curation efforts and enables data consumers to make informed decisions on data usage by answering the question: What is OSM LULC made of? First, three exemplary hypotheses were tested statistically on a global as well as a continental scale to analyse the triangular relation between elements' size, age and location in terms of population density. In a second step, k-means clustering was used to identify clusters based on the properties of the OSM objects. Before clustering, the data were standardised and stripped of any geographic information as we were hypothesising that the different clusters might be linked to different geographic regions.
The results showed that larger objects were more frequently encountered in regions with a lower population density due to the 'natural' factor of higher fragmentation in these areas. Yet, the effect was surprisingly small on a global scale. A general mapping order where areas of high population density are mapped before lower population density areas could not be confirmed globally. This may be caused by a complex interaction between several indicators and regional tendencies, that remains to be fully understood. Regional tendencies were shown e.g. for the age of objects with North America and Europe containing older objects than Africa and Asia. The five k-means clusters formed interesting groups worth further investigation. For example the North American lakes or the complex European elements were each detected as distinct clusters by the algorithm.
Our current and future work will investigate the causes of these insights and link them e.g. to data quality to identify OSM elements that need the communities' attention. The presented tool already enables other data users to join us on this path.
Heritage graphic representation combining building spatial location and urban/land planning provides a powerful tool for government agencies. These techniques support the decision-making, simplify the development of protection and conservation inventories and allow the treatment of buildings from an integral urban/land scale view. From a technical perspective the representation of information at various detail levels, and involving different types of data and supports, provides a complete vision with multiple applications. Furthermore, this graphical representation of historical buildings offers an informative contribution that can be used to promote the architectural heritage with educational and touristic purposes.
The so-called Mudejar architecture is unique of the Iberian Peninsula and represents the influence of the Muslim culture in art and architecture between the 12th and 17th centuries within the territories conquered by Christians. The Autonomous Community of Aragón was one of the most influenced territories and hence the Aragonese Mudejar gained its own peculiarities that differentiate it from the rest of the territory. Some representative Aragonese Mudejar buildings were declared as World Heritage Sites by UNESCO in 2001. In the field of architecture, the typologies of fortress-churches and single-nave churches with a polygonal apse and simple ribbed vaults are representative from the Aragonese Mudejar. The bell towers have a characteristic structure, with a morphology similar to that used in the minarets of Muslim mosques. The use of traditional materials such as brick, plaster, stucco, ceramics and wood in the construction processes and the use of geometric shapes and plant themes for ornamentation also stand out, derived directly from the muslin tradition.
This work presents the development of a digital system to document and inventory the Mudejar architectural style in Aragón, involving a list of 225 buildings with unique architectural elements that are part of the World Heritage. The development of useful graphic representations of the architectural heritage requires to exceed the classical inventory description level and to design graphical environments able to contain further information about the cultural assets. It is necessary to clarify a methodology to collect, organize and disclose information to common users and urban managers following a standardized procedure. First, the information collected from the historical Mudejar buildings was structured following standardized criteria and stored in digital sheets, creating a complete inventory of the Aragonese Architectural Heritage. This structured digital inventory of the Mudejar Heritage ensures that the information lasts over time, as well as helps design conservation measures and promotions actions (Quintilla, 2021).
A geospatial web tool has been developed to organize and make available 2D and 3D architectural data of the buildings which enriches the descriptive information provided by the digital inventory sheets. The main goal is to provide a standardized basis for recording digital 2D/3D graphic documentation, supporting the use of this information in an understandable and coherent way in future conservation actions. The proposed geospatial web tool allows the dissemination and exploitation of the architectural information by different users through a website that integrates a cartographic viewer (WebGIS) and also offers access to a point cloud manager based on WebGL. The geospatial structured data are accessible through an interface with different visualization styles that are adaptable depending on the purpose, such as technical studies, reconstruction actions, informative campaigns, etc., opening up the possibilities of use of the available information. Furthermore, the 3D point cloud viewer supports the creation of a user-friendly repository of geometric information of the registered heritage assets.
The 3D information collected for each historical building is made available to end-users by means of an ad-hoc interactive point cloud environment based on the Potree viewer project (Potree, 2022). The three-dimensional geometric information is obtained by the combination of photogrammetry and laser scanner techniques. The result is a high-density point cloud model of the building that is used as a 3D support where the data provided by the different technicians involved in the documentation process can be incorporated. Traditionally, presenting models to the end user required transferring large amounts of data and installing third-party applications to view it. However, this point cloud viewer is based on the WebGL technology which enables the delivery of 3D content through web browsers without and installing third-party applications and which is natively supported by all devices. Previously, the software Cloud Compare (Cloud Compare, 2022) has been used to perform cloud segmentation and sub-sampling, as well as for the classification of different cloud groups into architectural elements.
The Digital Inventory of the Aragonese Mudejar Architectural Heritage is a digital repository with graphic material composed of photographic and 2D/3D volumetric information, which forms a complete documentation of the geometry of the building and achieves the correct characterization for metric or informative purposes.
Data-driven innovation, as outlined by Granell et al. (2022), has seen recent advances in technology driven by the continuous influx of data, miniaturization and massive deployment of sensing technology, data-driven algorithms, and the Internet of Things (IoT). Data-driven innovation is considered key in several policy efforts, including the recently published European strategy for data, where the European Commission acknowledged Europe’s huge potential in the data economy by leveraging on available data produced by all actors (including public sector, private sector, academia and citizens). Technologies currently used for the management, exchange and transmission of data, including geospatial data, must be evaluated in terms of their suitability to efficiently adapt to streams of larger data and datasets. As more users access data services through mobile devices and service providers are faced with the challenges of making larger volumes of data available, we must consider how to optimise the exchange of data between these clients and servers (services). For many years JSON, GeoJSON, CSV and XML have been considered as the 'de facto' standard for data serialisation formats. These formats, which enjoy near ubiquitous software tool support, are commonly used for the storage and sharing of large amounts of data in an interoperable way. Most Application Programming Interfaces (APIs) available today facilitate data sharing and exchange, for a myriad of different types of applications and services, using these exchange formats (Vaccari et al., 2020). However, there are many limitations to approaches based on JSON and XML when the volume of data is likely to be large. Potentially the most serious of these limitations is related to reduced computational performance, when exchanging or managing large volumes of data where there are high computational costs associated with (de)serializing and processing these data.
Against this background, binary data serialization approaches allowing for the interoperable exchange of large volumes of data have been used extensively within scientific communities such as meteorology and astronomy for decades. In recent years, popular distributors of geospatial data have also begun making use of binary data formats. Examples are OpenStreetMap (OSM) data (e.g. the OSM Planet and OSM Full History Planet files, providing access to the whole OSM database and its history) as well as the popular ESRI Shapefile format's main file (.shp), which also contains geometry data and is stored as a binary data file.
In this paper we describe the methodology, implementation and analysis of a set of experiments to analyse the use of binary data serialization as an alternative to data exchange in XML or JSON data formats for several commonly encountered GIS workflows. Binary data serialization allows for the storage and exchange of large amounts of data in an interoperable fashion (Vanura and Kriz, 2018). While anecdotal evidence indicates binary serialization approaches are more efficient in terms of computation costs, processing times, etc., there are additional overheads to consider with these approaches including special software tools, additional configuration, schema definitions, etc. (Viotti and Kinderkhedia, 2022). Additionally, there have been few, if any, investigations of binary data serialization approaches specifically for geographical data. Our set of experiments investigates the advantages and disadvantages of binary data serialization for three common GIS workflow scenarios: (1) geolocation point data from an OGC SensorThings API; (2) geolocation point data from a very large static GeoPackage dataset representing the conflation of address data from the National Land Survey of Finland and OpenStreetMap; and (3) geographic polygon datasets containing land cover polygons (currently ongoing work). We consider comparisons of JSON and GeoJSON with two very popular binary data formats (Proos and Carlsson, 2020), namely Google Protocol Buffers and Apache Avro. Protocol Buffers (Protobuf) is an open source project developed by Google providing a platform neutral mechanism for serializing structured data. Apache Avro, another very popular schema-based binary data serialization technique, is also a language-neutral approach which was originally developed for serializing data within Apache Hadoop. Both Protobuf and Avro have wide support in many popular languages such as C++, C#, Java and Python. The full paper will provide detailed descriptions of the implementations of our experiments. However, here we provide a summary of some of the key results and highlights of our analysis.
As binary data formats such as Protobuf and Avro are not self-describing schemata and schema definitions are required for each dataset or data stream, these definitions are required for the serialization and deserialization of the binary data files. Any changes in the underlying data models of the dataset or data stream will require a change in the schema definitions.
For all of our experiments the serialized binary data files were at least 20% smaller on average than the original non-binary data files. Processing times for binary serialization of data from API sources were approximately 3.7 times faster on average than serialization to JSON or GeoJSON formats. Processing times for binary serialization of the datasets were, on average, at least 10% faster than serialization to JSON or GeoJSON formats.
It is difficult to point to a clearly defined set of results which indicate that binary data formats are an overwhelmingly better choice for data exchange than XML, JSON or GeoJSON. While binary data formats enjoy very good expert developer level support in major programming language implementations, this is dwarfed by the near universal levels of support for XML, JSON and GeoJSON in almost all major programming languages.
There are a number of potential avenues for future, including automated semantic interoperability for binary data serialization using linked geodata, opportunities for more integrated software tool support for binary data processing and further computational experimentation on different types of datasets and services which could benefit from binary data serialization.
The software implementation is carried out using Python 3 on Ubuntu Linux. All software code is made publicly available via the GitHub repository https://github.com/petermooney/jrc_binarydata. Detailed instructions on how to reproduce and replicate all of the experimental analysis are provided within the repository.
Earth observation (EO) imagery has become an essential source of information to better monitor and understand the impact of major social and environmental issues. In recent years we have seen significant improvements in availability and accessibility of these data. Programs like Landsat and Copernicus release new images every day, freely and openly available to everyone. Technological improvements such as data cubes (e.g. OpenDataCube), scalable cloud-based analysis platforms (e.g. Google Earth Engine) and standardized data access APIs (e.g. OpenEO) are easing the retrieval of the data and enabling higher processing speeds.
All these developments have lowered the barriers for utilizing the value of EO imagery, yet translating EO imagery directly into information using automated and repeatable methods remains a main challenge. Imagery lacks inherent semantic meaning, thus requires interpretation. For example, consider someone who uses EO imagery to monitor vegetation loss. A multi-spectral satellite image of a location may consist of an array of digital numbers representing the intensity of reflected radiation at different wavelengths. The user, however, is not interested in digital numbers, they are interested in a semantic categorical value stating if vegetation was observed. Inferring this semantic variable from the reflectance values is an inherently ill-posed problem, since it requires bridging a gap between the two-dimensional image domain and the four-dimensional spatio-temporal real-world domain. Advanced technical expertise in the field of EO analytics is needed for this task, making it a remaining barrier on the way to a broad utilization of EO imagery across a wide range of application domains.
We propose a semantic querying framework for extracting information from EO imagery as a tool to help bridge the gap between imagery and semantic concepts. The novelty of this framework is that it makes a clear separation between the image domain and the real-world domain.
There are three main components in the framework. The first component forms the real-world domain. This is where EO data users interact with the system. They can express their queries in the real-world domain, meaning that they directly reference semantic concepts that exist in the real world (e.g. forest, fire). For simplicity reasons, we currently work on a higher level of abstraction, and focus on concepts that correspond to land-cover classes (e.g. vegetation). For example, a user can query how often vegetation was observed at a certain location during a certain timespan. These queries do not contain any information on how the semantic concepts are represented by the underlying data.
The second component forms the image domain. This is where the EO imagery is stored in a data cube, a multi-dimensional array organizing the data in a way that simplifies storage, access and analysis. Besides the imagery itself, the data cube may be enriched with automatically generated layers that already offer a first degree of interpretation for each pixel (i.e. a semantically-enabled data cube ), as well as with additional data sources that can be utilized to better represent certain properties of real-world semantic concepts (e.g. digital elevation models).
The third component serves as the mapping between the real-word domain and the image domain. This is where EO data experts bring their expertise into the system, by formalizing relationships between the observed data values and the presence of a real-world semantic concept. In our current work these relationships are always binary, meaning that the concept is marked either as present or not present. However, the structure allows also for non-binary relationships, e.g. probabilities that a concept is present given the observed data values.
We implemented a proof-of-concept of our proposed framework as an open-source Python library (see https://github.com/ZGIS/semantique). The library contains functions and classes that allow users to formulate their queries and call a query processor to execute them with respect to a specific mapping. Queries are formulated by chaining together semantic concept references and analytical processes. The query processor will translate each referenced semantic concept into a multi-dimensional array covering the spatio-temporal extent of the query. It does so by retrieving the relevant data values from the data storage, and subsequently applying the rules that are specified in the mapping. If the relationships are binary, the resulting array will be boolean, with “true” values for those pixels that are identified as being an observation of the referenced concept, and “false” values for all other pixels. Analytical processes can then be applied to this array. Each process is a well-defined array operation performing a single task. For example, applying a function to each pixel or reducing a particular dimension. The workflow of chaining together different building blocks can easily be supported by a visual programming interface, and thus lowering the technical barrier for information extraction even more. This is demonstrated already in an operational setting by Sen2Cube.at, a nation-wide semantic data cube infrastructure for Austria, which uses our proposed semantic querying framework .
We believe our proposed framework is an important contribution to more widely accessible EO imagery. It lowers the barrier to extract valuable information from EO imagery for users that lack the advanced technical knowledge of EO data, but can benefit from the applications of it in their specific domain. They can now formulate queries by directly referencing real-world semantic concepts, without having to formalize how they are represented by the EO data. To execute the queries, they can use pre-defined mappings, which are application-independent and shareable. The framework eases interoperability of EO data analysis workflows also for expert users. Mappings can easily be shared and updated, and the queries themselves are robust against changes in the image domain.
opic modelling is a branch of Natural Language Processing that deals with the discovery of conversation topics in a document corpus. In social media, it translates into aggregating posts into topics of conversation and observing how these topics evolve over time (hence the “dynamic” adjective [Murakami, 2021]). Conveying the results of topic modelling to an analyst is challenging since the topics often do not lend themselves naturally to meaningful labelling, where relationships between them can involve hundreds of dimensions. Furthermore, the popularity of topics is itself subject to change over time.
In this paper, we propose a spatialization technique based on open-source software that reduces the intrinsic complexity of dynamic topic modelling output to familiar topographic objects, namely: ridges, valleys, and peaks. This offers new possibilities for understanding complex relationships that change over time, that overcomes issues with traditional topic modelling visualisation approaches such as network graphs [Karpovich, 2017].
Spatialization [Fabrikant, 2017], a technique that uses spatial metaphors to aid cognitive tasks, has been a research field since the early ‘90s. It can be used to make sense of vast amounts of information by reducing them to a physical landscape. In this work, we consider spatialization of topics in a 3D space where the X-axis is the similarity of topics posted on the same day, the Y-axis is the similarity of topics across time and how their relationships evolve, and the Z-axis is a measure of the topic popularity. With this approach, a topic is therefore reduced to a single point in a 3D space, and the interpolated surface constructed out of these points becomes a landscape with peaks, ridges, and valleys. More precisely, the “valleys” represent less popular topics, while “peaks” are the more popular ones and flat surfaces indicate the average topics.
Our team is working on the Australian Data Observatory project, which has been collecting tweets and other social media posts (Instagram, Reddit, YouTube, Flickr, etc)) related to Australia for the last 12 months. Through the use of the new Twitter academic license, the project is harvesting 10s of millions of tweets per month. The social media posts are stored and analyzed daily using the deep learning BERTopic package. The BERTopic output is then stored and served through a ReST API, which is used by different clients (at present these are Jupyter notebooks and a web application). The intended audience of our platform is composed of the average topics domain researchers including social scientists, linguists, and data journalists. The goal is to support big data exploration at scale and overcome the smaller scale cottage industry of social media research that has hitherto been the norm in academia in Australia
Topic modelling is often presented using 2D visualizations, such as circles with size proportional to topic popularity and position related to the similarity between topics, The dynamic (temporal) aspect of topic evolution is typically shown with animations that show how topics morph into different ones and wax and wane in popularity or it is ignored completely and researchers just use static topic modelling visualisations. here is merit in trying a different approach for dynamic topic visualisation: namely, to map the social media landscape to the physical one, as this metaphor allows the simultaneous appreciation of time, topic similarity, and popularity while allowing -via zoom operations- the aggregation/disaggregation of topics into bigger/smaller cluster of posts. This 3D landscape naturally aids the end-user in understanding complex highly dimensional data at a scale and volume that would otherwise be impossible. The formation of islands, archipelagos, mountain ranges or valleys related to mainstream topics such as Covid, vaccination, lockdown, through to geopolitical events such as the invasion of Ukraine provides a finger on the pulse of what is being discussed at scale by the broader population across the social media landscape.
This approach is currently realised using a web application that enables the “topographic” exploration of the topic landscape with functions to improve the user experience in the areas of topic labelling and inter-topic distance.
There are a few criticalities in the proposed visualization:
distance between topics has to be drastically reduced in dimensionality from the ones provided by the Deep Learning model to just one (the X-axis);
the Y-axis (time) has to be put in relation to a completely different measure (distance between topics) to make it amenable to an interpolation;
topic popularity (the Z-axis) has a huge variability leading to irregular surfaces, hence the need for a non-linear scaling of the Z-axis;
communicating the meaning of each topic to the user is difficult, as the top terms of each topic may not be meaningful to a human, and make for a poor label.
The proposed processing and visualization is developed using only open-source tools and frameworks, leveraging the work of the open-source geospatial community.
All the software developed in the course of the Australian Data Observatory project is available under the Apache 2.0 license, and available through the University of Melbourne GitLab source code repository.
The Covid-19 outbreak has greatly impacted society behaviours fostering proximity tourism and valorising the social role of peri-urban natural protected areas as key locations for outdoor activities . This shift in habits calls for an adaptation in the next years of the offerings and management of these areas to respond to users' expectations of positive experience opportunities in near-by locations . In the context of digital transformation and peri-urban protected areas, this research investigates the contribution that open geospatial technologies can provide in the creation of new economic, social and cultural values to propose solutions and identify gaps or open issues.
The adopted methodology is the “case study approach”, in which real cases are used to design, develop, implement, collect and analyse data to extrapolate information that contributes to a deeper knowledge of the matter. This research is framed in the context of the Interreg INSUBRI.PARKS (www.insubriparksturismo.eu) and among the project’s parks the selected case studies for technological testing are the Parco Gole della Breggia and the Collina del Penz. While being two natural protected areas closely located in the southern part of Switzerland, in the Canton Ticino, they greatly differ for in-place management structure, available offers and users’ type and therefore represents different needs. From the discussion with local tourism organisations and park administrators we have identified three specific aspects that are of particular concern: (a) the creation of 3D digital products, (b) the monitoring of touristic fluxes and (c) the conduction of parks management activities. This work presents the intermediate results of the development and testing of different selected solutions which describes the approach, the issues and the potential of explored solutions with respect to the open source software.
3D digital products - In addition to a more traditional use for conservation scopes and activity planning , 3D models can be used to offer positive experiences thanks to an enhanced understanding of specific intangible aspects . For example, in the case study of the Parco Gole della Breggia, it might be difficult for a tourist to fully realise the extent of the anthropic impacts on nature. The area is geologically relevant for the visible calcareous formation hundreds of millions of years old. From 1961 to 2003 the Breggia shores hosted a large cement plant that strongly modified the territory. Today, only a small part of the plant is still in place as a testimonial of the anthropic impacts and element of industrial archeology. To support the perception of the real antropic impact we decide to implement three digital models representing the territory at three key epochs: before the cement plant construction, at the maximum expansion of the plant and at the present state. The present state model can be created by means of laser scanning and photogrammetric surveys while the other two can be realised by digitising historical maps, technical plans and historical pictures. The investigation identified a workflow based on the evaluation of CloudCompare, Riegl RiSCAN Pro and Cyclone 3DR for 3D survey, Regard 3D; GRASS, QGIS and ESRI ArcGIS Pro for spatial data collection and management; Blender, AutoCAD 3D, Rhinoceros and Sketchup for vector modelling of spatial elements; Nubigon and Potree for a better graphical representation and further web dissemination of the results.
Monitoring of tourtistic fluxes - The monitoring of touristic flux is important for the correct management of the natural protected areas to assure the Tourism Carrying Capacity (TCC) of trails is not exceeded, to assure adequate economic resources are allocated to maintain the assets, to understand the tourist behaviours and consequently develop strategies and plans to maximise the touristic value of the park . While different solutions were proposed to this scope (accelerometers on iron plates and proximity radar sensors) it is important to capture specific tourist characteristics, like for example the presence of animals, the direction and the use of bicycles or cars. To this aim, Machine Learning models can help to automate the collection of such information by image analyses and object detection . The present paper presents a fully open prototype to implement and deploy a real-time tourist monitoring system composed of: sensing device, data communication, data management and data visualisation platform. The system includes the usage of the YOLO open source solutions for image recognition, the OGC SoS open standard and the istSOS implementation for data management and sharing and the open source Grafana software for data visualisation and analysis. The results from the testing of the prototype in two locations for a period of 6 months is presented supplemented with field validation data.
Digital Management of Protected areas - Protected areas are currently managed using different tools that are very often scarcely digitised. This approach does not exploit the potentiality of digitalization and does not foster the capacity to extract insights from data. While different open source project management software exists, none is specifically designed to address natural area management processes. For this reason a novel application, based on an open source platform has been developed and implemented. The cloud solution named Park Asset Management (PAM) is based on the usage of PostgreSQL/PostGIS and OpenLayers in conjunction with KeyCloak authorization platform, the Hasura GraphQL Engine integrated in the Vue.js framework. The containerized application offers the following features: park asset management and information sharing, working task management and execution, rentals management, notification management, offering a map interface and a more classic calendar and table views. This platform enables insights extraction like maintenance cost of itineraries, income from location rental by months and by years, cost and time required to replace items and the frequency of occurrence of events.
Spatial data infrastructures prioritize data interoperability to serve their diverse communities. Geospatial knowledge graphs (GKG) are a form of database representation and handling that aim to meet the challenges of data interoperability, reasoning for information storage and knowledge creation, and user access that provide coherent spatial context to a domain of information. This paper discusses the development of a prototype GKG based on national topographic databases. Geospatial data are used to test interoperability aspects of ontology creation, faceted search and retrieval using GeoSPARQL (Open Geospatial Consortium, 2022), and user interface for data visualization and evaluation. The challenges are to capture and represent geographic semantics inherent in the source data, to integrate data from outside sources through SPARQL Protocol and RDF Query Language (SPARQL) queries and to visualize the data using a cartographic user interface.
Poore (2003) identified four levels of data interoperability: articulation, sharing, integration, and alignment. These concepts are carried into the semantic technology design and application. Called the Map as Knowledge Base (MapKB), the approaches use software components to build a system architecture aligned with available standardized vocabularies and is composed entirely of free and open-source software for geospatial data The application was created in the context of The National Map of the U.S. Geological Survey (USGS). For purposes of data interoperability, the GKG ontology, queries, and visualization were studied for the system.
Data pre-processing involved creating a GKG ontology. The ontology was semi-automatically transformed from source databases through the application of rules on schema attribute, domain, and metadata files to create classes, properties, and other triple resources of Resource Description Framework (RDF) and Web Ontology Language (OWL) (Hayes and Patel-Schneider, 2014; Hitzler and others, 2012). An R2RML file was created using Web-Karma for transforming the feature-level instance data using the ontology and confirmed using standards specifications (University of Southern California, 2016; Das and others, 2012). The converted data and ontology are imported into a triplestore for data handling.
A cartographic user interface (UI) was created as a foundation for the visualization and interaction of users with the triplestore graphs. The general guidelines given by the information search process model serves to guide UI functionality (Kuhlthau, 2004). The user interface offers menu search options by namespace for typically retrieving initial results. Multiple graphs can be visualized at once. Other queries can be performed on the initial results appearing on a map or table by faceted search and by query builder interfaces for SPARQL. An advanced feature description function retrieves related properties to support browsable graph searches. Linked Open Data were retrieved using SPARQL endpoints to test linking triples. Some GeoSPARQL support was created for geospatial queries on feature geometries of the GKG use cases.
The automated transformation ontology revealed aspects of data silos that were known to exist. However, the ontology model created a new perspective of data resources across the enterprise, where resource semantics could be streamlined for reuse. This was demonstrated in the post-processing stage of the ontology creation. The system and ontology design were validated through reasoning of semantically related data and pre-determined competency questions relevant to reasoning results. An ontology pattern of aligning feature classes represented as codes and geometries of The National Map matched to the GeoSPARQL ontology feature and geometry classes was validated using reasoners. The ontology for feature interoperability provided inferred information for competency questions such as “What type of feature is classified as FCode 73002,” or “How are streams represented geometrically?” The GKG alignment with Linked Open Data used some specific widely used vocabularies to be reused between graphs, and problems encountered could be resolved by designing a better metadata annotation approach for structural alignment in addition to syntax matching. Multiple GeoSPARQL queries executing topological relations on features were successfully demonstrated with a pre-built query to find specified buildings on a road section between two cross streets. Such a query can depend on the shape of the road, building distance from the roadway, and other factors. The queries required a change in viewpoint from machine computation to landscape cognition creating related semantic factors, and then were followed by GeoSPARQL function computation.
This project tested some key challenges for GKG applications for spatial data infrastructure interoperability including data transformation, ontology design, information search and retrieval, and multi-modality cartographic visualization. Completing the resulting ontology from automated data transformation for knowledge representation is still a cognitive activity. RDF and OWL vocabulary were sufficiently expressive to demonstrate linking and reasoning successes. Improved metadata annotation systems are needed for on-the-fly entity resolution. Although initial tests of GeoSPARQL techniques were successful, the full capabilities of SPARQL as a rule-based reasoning tool would need further research for queries that leverage the full semantic capabilities of knowledge graphs and for their portrayal.
Disclaimer Any use of trade, firm, or product names is for descriptive purposes only and does not imply endorsement by the U.S. Government.
The last two decades have seen the development and diffusion of new technologies and digital ecosystems for managing geographic data. These include, among others, smartphones, drones and open access satellites on the one hand, and the web 4.0, GIS, WebGIS, geo-app and georeferenced data, both open-source or proprietary, on the other. This great variety of tools, accompanied by the sharing of new digital knowledge and skills, have made the creation and management of spatial information much more accessible than it was in the past.
This has led to a proliferation of processes for exploring, creating and sharing geographical data from below as a way for citizens, that assume the role of neo-geographers or prosumers, to take part in decision-making in different kind of processes, such as territorial, environmental and climate change issues (Goodchild, 2009; Capineri et al., 2016; See et al., 2016).
However, these are ongoing processes that have still to face technological, cognitive and economic barriers. Universities with the use of open-source geo-information and communication technologies (Geo-ICTs) in enhance geographical learning should be a primary actor in supporting students and citizens in developing their own spatial thinking in a more efficient and engaging way (Käyhkö et al., 2021). In fact, this is remarked also in objective 4 of the Sustainable Development Goals "to guarantee quality, inclusive and equitable education and to promote lifelong learning opportunities for all” and many universities have signed the Higher Education Sustainability Initiative (HESI) which commits them to integrate the concepts of sustainable development into the curricula.
In this framework is involved also University of Padova (Italy) with its Jean Monnet Centre of Excellence on Climate Justice (Jean Monnet Erasmus+ project 2021-2023) led by the research group “Climate change, territories, diversities” (https://www.climate-justice.earth/). The Centre is trying to respond to the need of bringing the issues of Climate Justice and just transition from the EU Green Deal framework into the dialogue between the academic world, society, and policy makers. To do this, it is carrying out different research and didactical activities, among which the development of a MOOC (Massive Open Online Course) on GIScience for Climate Justice with the use of opensource and freeware Geo-ICTs, that will be freely available for all before the end of 2022.
This MOOC will provide videos and materials about practical activities concerning climate change and climate justice issues, that the students can carry out autonomously using open-source and freeware tools. For every activity the workflow and a graphical abstract will be provided with aims and skills to be acquired and an introductive video with a real example of use and suggestions about how to build collective projects of citizen science. An auto-evaluation module will be available to students. MOOC will be tested with selected students and eventually adjusted before its online publication. A feedback and comment area to interact with staff members will be also available in the platform. The programme will follow learning by doing approach and is design to drive students through the main phases of a GIScience project:
- The exploration and use of the European Platforms (e.g Earth Observation Portals, Joint Research portals, European Environment Agency portals, European Environmental Bureau)
- The exploration and use of the Geonode on Climate Justice (https://research.climate-justice.earth/), the geo-platform of the Centre that will be available to everyone with all the information collected by the Centre and the possibility to create online maps and to upload and share data by interested users or association groups.
- The Collection and sharing of environmental and social information using geo-app and webGIS (e.g odk collect app and ona platform)
- The exploration and use of Google Earth Pro and the OpenStreetMap project Umap
- The creation of storymaps to share climate change fighting initiatives and climate justice stories on the web (e.g knight lab storymap and geonode storymap tools)
By completing the MOOC, students will learn how to autonomously update and increase their knowledge on climate change and climate justice issues, learning to navigate and use European platforms and portals and to search for the documentation available in the European and international institutions. Practical activities will improve skills of students and organizations of civil society to obtain and use data and information produced by European institutions, to produce and share their own data, and to prepare and manage collaborative projects for sustainability and environmental monitoring.
Open-source software will be also the basis for the setup of the MOOC, from its preparation using open video editing and open document formats, to its publication using the Moodle of the University of Padova.
In this contribution, the theoretical background, the entire methodology and workflow process for the preparation and dissemination of the MOOC will be presented and discussed, with the aim to disseminate and share this experience to actors interested in developing similar activities of using of Geo-ICTs for Good.
Lakes are a fundamental resource with a number of environmental benefits and with a not negligible influence on the local economy and on the quality of life. They work as a storage of water when floods or droughts occur, in the first case, they are useful to laminate the excessed flux of water, in the second as water supply during shortages. In addition, they influence the filling of groundwater and they play a role in the preservation of the general habitat biodiversity. From an economical point of view, they are an attraction for tourism, residential living as well as a source of recreation and of work for fishers.
Unfortunately, climate changes together with human activities are more and more threatening such resources modifying the known dynamics and affecting the general health status of lakes (Fenocchi et al. 2018; Free et al. 2021; Lepori et al. 2018).
In this context, the INTERREG project SIMILE (System for the Integrated Monitoring of Insubric Lakes and their Ecosystems), born from the collaboration between Italy and Switzerland, aim at developing an information system using an open source approach and based on innovative technologies to help decision maker in the management and evaluation of the status of the transboundary and sub-alpines lakes such as Lake Maggiore, Lugano and Como. The SIMILE project wants to intensify the monitoring of these lakes by creating an open real-time monitoring system and by integrating data coming from different sources in order to create the possibility to fully exploit the potential with the heterogeneity of the available information and better studying the resource.
The work presented in this paper is focused on the achievements reached by the research carried out on lake Lugano in the context of the SIMILE project after two years of work. In particular, the presented research is oriented on the automatic generation of some indicators that are usually calculated to evaluate the lake status through the use of open standard, software and hardware.
Lake Lugano is a transboundary lake divided in two main watersheds, North and South, respectively with an area of 27.5 Km2 and 21.4 Km2 and a maximum depth of 288 and 89 m. It is a eutrophic lake which has a critical health status in particular during the 70s, but thanks to new regulations and to the mitigation actions studied by the Swiss administration it is recovering. One of the fixed targets is to reach 150 gC/y which corresponds to a mesotrophic status. This value gives information about the metabolism activity of the lake and can be calculated using different approaches. At this moment, on lake Lugano, to get such information monthly campaign according to the Nielsen method (Nielsen, 1952). This approach is the one recognized by the administration and it is conducted by specialist limnologists. However, it has some issues that can be synthetized in three points: 1) it needs the use of radioactive components; 2) it is quite expensive in terms of man hour and the engagement of an external laboratory to analyze such kind of special samples; 3) since it has a monthly temporal resolution it needs mathematical model to interpolate data between the different campaigns.
According to this overview, the proposed paper wants to investigate a fully open web solution in order to calculate indicators that can help in understanding the health status of the lake and try to solve the individuated limits that are currently affecting the water monitoring. Such an open platform uses open standards as the Sensor Observation Service (SOS) of the Open Geospatial Consortium (OGC) to integrate different sources of data and to offer the possibility to gather the information in a standardized way. Thanks to this achievement, it was possible to develop. The scope is to standardize the calculations and provide a solution where indicators can be calculated automatically saving also time since the traditional process. Potentially such an approach could calculate in real-time the indicators thanks to the use of the LISTEN/NOTIFY feature which exists in PostgreSQL, the database technology on which the platform is based. Finally, in this paper is presented the preliminary results of the development of a new algorithm to calculate the lake metabolism which can, if validated, offer a new approach that can solve the individual issue of the current one. Basically, the developed open monitoring system implemented and deployed on the lake offers real-time data
The platform is composed of dockerized and specialized services in order to offer a suite that is easily replicable, scalable and upgradeable.
In conclusion, an overview of the results reached during these years of project is presented. Such a solution increases the replicability of the system since it is fully open and guarantees the openness of data, source code, standards and also the hardware part. Such technologies help in developing an automatic system that can calculate indicators to help decision makers in managing the water resource and scientists to better study the new unknown dynamic and facing the new challenges to which lakes are exposed.
In developing countries, sustainable development and territorial intelligence are of greater interest to public authorities and citizens. In Algeria, the combination of resources with technological innovation goes in the direction of building a productive territorial intelligence. This translates into a process aiming at developing a systemic approach of the territory in order to analyse its physical, social and economic dimensions in order to exchange the different points of view of the territorial, social and economic actors and to make the policies more coherent. In this contribution, we have focused the research on studies related to decisional computing used by governmental entities, especially in the field of public services. It turned out that the use of collaborative web platforms involving several actors belonging to different spheres (government, economy, social, etc.), constitutes a tool for the development of territorial intelligence thanks to the availability of data which allows a considerable saving of time and cost. Indeed, the construction of a territorial information system makes possible the networking of these actors, to elaborate clear and reliable schemes of urban planning for a liveable environment, which led us to think about the implementation of a web platform for exchanges, collections, production and dissemination of data and social animation to reach equitable consensus. This will allow, among other things, the development of project management through the formalisation of objectives and collaborative work for the planning and optimisation of tasks. Geographical information is a crucial element in most of the daily uses thanks to the intelligent applications put online and exploited by different categories of connected people. Therefore, the interest and necessity of sharing geo-located information for decision support systems is well proven nowadays. In the same context, participatory mapping initiatives through voluntary geographic information (VGI), citizen-generated content or crowdsourcing are now being used as a new instrument for information gathering and two-way exchange between the various entities in the urban environment ranging from ordinary citizens to leading actors. This direct data is a key element in all the decision-making processes leading to the achievement of urban governance modalities. The objective of our work is to provide an interactive solution ensuring the collaboration of actors (decision-makers and citizens) on a webmapping platform for the reporting of needs by citizens in terms of public services such as road defects, public lighting failures and any other existing problems in an urban area. This application could also be used for emergency alerts (road accidents, natural disasters, etc.). As a study area, we chose the city of Oran, located in the west of Algeria, which is the second largest urban metropolis in the country. The realization of the collaborative web mapping platform is based on Free and Open-Source Software for Geospatial (FOSS4G). As a spatial database management system, we used PostgreSQL with its spatial extension PostGIS, which is classified as one of the most powerful open source DBMS. The GIS server used in our application is GeoServer, which guarantees to satisfy a maximum of required webmapping services (WMS, WFS, WMTS, WCS, etc.). The webmapping interface must offer two main components: an interactive citizen space with the web map and a space for decision makers who will be able to consult, verify and validate the data sent in order to proceed with the action. Among the development options for this type of webmapping interface, we are interested in GeoNode, an open source framework based on mature and robust frameworks and software like Django, OpenLayers, PostGIS, GeoServer and pycsw. In our case, GeoNode will allow the integration of a multitude of geospatial functions for manipulating data and responding to any type of request on the web map. The platform, which we have named "Wilayati", will offer new participatory methods for monitoring activities in the urban environment. Its functionalities will ensure, on the one hand, the sharing of data on a map based on voluntary contributions from the citizens of Oran and, on the other hand, the visualisation and manipulation of the data by decision-makers in order to give them a support for the management of localised interventions. Different types of data on the urban fabric of the city of Oran were collected from the processing of satellite images as well as datasets on the road network of Oran obtained from OpenStreetMap after improving the intrinsic quality. In parallel, a campaign on social networks will soon be launched, with the aim of better analysing the orientations of the public services most requested by citizens. The application, under development, will provide a new source of data that can be easily exploited in urban governance and will provide a way for citizens to participate in improving their environment through regular updates of the geographical database. Finally, as a perspective, the results, after deployment of the platform, will give an overview of the impact of citizens in participatory mapping highlighting points of interest and urban infrastructures of cities in Algeria.
Map renderers play a crucial role in various applications deployed in Web, desktop, mobile, and embedded environments. For instance, we rely upon them to travel, commute, find the best hotels and restaurants, and locate our closed ones. More digital applications emerge in various areas, such as urban planning, transportation, or even pandemic monitoring, as they get adopted. Beyond digital environments, it is worth noting that maps also get printed in books, reports, or pieces of urban furniture.
Rust - Rust is a high-level programming language designed for safety and high performance. The project started at Mozilla and is now developed by the Rust foundation. Its compiler targets native architecture, enabling it to compile applications for desktop (x86) and mobile (arm) environments. Additionally, the Rust compiler can target WebAssembly, a binary instruction format that can run on web browsers with near-native speeds. This not only enables Rust applications to run in native environment but also to be included as a library in Web applications. As a result, the same codebase can be used anywhere with only a few modifications.
WebGPU - WebGPU is a 3D low-level API that runs on top of DirectX, Metal, Vulkan or OpenGL depending on the platform and gives the developer access to the GPU. It is developed by the W3C GPU for the Web Community Group with engineers from Apple, Microsoft, Mozilla, Google, and others. It is considered the successor of WebGL version 2. Contrary to WebGL version 1 and WebGL version 2, which were solely designed for the Web, WebGPU implements a standard header file (webgpu.h) that makes it cross-platform.
Based on the emerging technologies identified in the review, we study the feasibility of creating a truly portable map renderer. We present maplibre-rs, a proof-of-concept released under the terms of the Apache Software License, that can render vector tiles natively and in the browser. We describe its overall architecture and highlight some of the challenges encountered while devising a portable solution that transforms vector tiles into 2d and 3d objects. These challenges include:
Rendering 2d vector tiles in a 3d environment - The vector tile specification describes simple 2d objects encoded in grid coordinates, such as points, lines, polygons, multi-polygons, and polygons with holes. Several steps enable to convert these 2d objects into 3d objects that can be rendered in a scene, including: the conversion of grid coordinates into 3d scene coordinates; the tessellation of polygons to display surfaces in the 3d environment; the extrusion of buildings based on their number of storeys with an attribute stored in the vector tiles.
Using WebGPU as a portable 3d rendering pipeline - WebGPU exposes a wide variety of features to render 3d scenes. Among them, we explore: the rendering of the 3d objects with the WebGPU Shading Language (WGSL) based on styling rules and object attributes; The navigation within the 3d world with the camera, user inputs, rotation on 3 axes, levels of details and occlusion culling; The configuration of the graphic card, graphics API and more.
Devising a portable network library - Rust does not provide a network library that works both natively and in the browser. We created a uniform interface to download vector tiles to address this issue. This interface, based on the facade pattern, uses macros to select the proper implementation at compile-time depending on the targeted architecture. The native implementation relies on the HTTP package of the standard library. The WebAssembly implementation relies on Fetch API bindings.
Finally, we present our future work and explore possible ameliorations. Overall, this review and feasibility study gives an exciting glimpse on a possible future for map renderers, where the same code can run natively and in a browser.
Processing Earth observation data modeled in a time-series of raster format is critical to solving some of the most complex problems in geospatial science ranging from climate change to public health. Researchers are increasingly working with these large raster datasets that are often terabytes in size. At this scale, traditional GIS methods may fail to handle this processing and new approaches are needed to analyze these datasets. The objective of this work is to develop methods to interactively analyze big raster datasets with the goal of most efficiently extracting vector data over specific time periods from any set of raster data.
In this paper, we describe RINX (Raster INformation eXtraction) which is an end-to-end solution for automatic extraction of information from large rasters datasets. RINX heavily utilizes open source geospatial techniques for information extraction. It also complements the traditional approaches with state-of-the-art high-performance computing techniques. This paper will discuss details of achieving this big temporal data extraction including methods used, code developed, processing time statistics, project conclusions, and next steps.
The input for RINX is a set of rasters from which the information has to be extracted and a set of data point locations for which the information needs to be extracted. The output for RINX is a structured representation of extracted information from the raster datasets for each data point in CSV text format. The loading and pre-processing of the input datasets to RINX is accomplished using a combination of Bash and SQL scripting techniques for automation. This pre-processed input is then fed into the open source spatial database PostGIS to extract the required information by using multiple spatial techniques. Finally, the extracted output is post-processed for deduplication and standardization of extracted information for research use. RINX is designed in a way that makes it easy to deploy and scale on any local, cloud, or cluster computing platform.
RINX was created to aid the study of environmental conditions and how they affect the health of people over their lifespans. This involves calculating exposures such as air pollution, humidity, precipitation, temperature, and other exposures at cohort member address locations over time. For initial work with one cohort, daily precipitation, temperature, and humidity estimates were needed for 4,796 cohort address locations for a 19 year time period, 1999 – 2017.
The 800-meter resolution PRISM Spatial Climate Dataset for the Conterminous United States was used as the input for this data extraction. PRISM refers to Parameter-elevation Relationships on Independent Slopes Model, created by the PRISM Climate Group, Oregon State University. The PRISM dataset is published in .BIL raster format, with one raster representing one climate variable per day for the time period 1981 - 2020. The total size of the dataset is around 8 TB with over 100,000 rasters of size 85 MB each.
For work on the initial cohort, RINX enabled the extraction of 7 key climate variables: precipitation, temperature (maximum, minimum, mean), dew point temperature (mean), and vapor pressure deficit (minimum, maximum) for 19 years of data from 48,500 800-meter resolution rasters for 4,796 data points. This resulted in a total of 10.3 Million “patient-day” calculations creating a total of 72.1M observations. Additionally, absolute and relative humidity were calculated using the existing mean temperature and dewpoint variables. RINX provided a unified solution of 9 climate variables for all persons/days for the entire dataset. It was deployed and scaled on multiple servers on a high-performance computing cluster. Our initial results reveal that it is extremely fast and efficient in processing large raster datasets. It took 1 day to load and 4 days to process and extract 7 climate variables from 48,500 rasters for the 72.1M observations at 4,796 locations. RINX enabled the researchers to analyze this big climate dataset at a fine-grained address level with high efficiency and speed. Once the scripts were written, tested, and fine tuned, processing time was reduced from months to days compared to traditional methods, resulting in substantial time savings.
We are currently testing RINX on a much larger dataset of 100,000 input point locations for a time period of 1981 - 2020, spanning the full range of the PRISM 800m data. This climate data is only available for purchase, however the PRISM Climate Group has made a version of this data available for free at a resolution of 4 kilometers. To make our solution entirely repeatable with open source software, code, and data, we will use RINX to extract point location data from the freely available 4km PRISM data. Results from these analyses will be presented as part of this paper.
Our solution is based on open source technology, using PostGIS that can be deployed on local or cluster computing environments. It provides an efficient way to solve geospatial big data problems, particularly those involving large temporal raster datasets where point location data extraction is desired. Big data is changing the ways data is managed and analyzed. The next generation GIS tools can help researchers process big data at scale. RINX is an end-to-end data extraction and processing solution for large raster datasets. RINX is open-source and will be shared on Github. It can be easily deployed and scaled on any local, cloud, or cluster computing environment. We used RINX for processing on a large number of PRISM climate datasets, however our solution could be applied to any temporal raster data such as NDVI, night lights, and more.
Digital elevation models (DEMs) are a representation of the topography of the Earth, stored as elevation values in regular raster grid cells. These data serve as basis for various geomorphological applications, for example, for landslide volume estimation. Access to timely, accurate and comprehensive information is crucial for landslide analysis, characterisation and for understanding (post-failure) behaviours. This information can subsequently be used to effectively assess and manage potential cascading hazards and risks, such as landslide dam outburst floods or debris flows. Freely available DEM data has been an important asset for landslide volume estimation. Earth observation (EO) techniques, such as DEM differencing, can be leveraged for volume estimation. However, their applicability is reduced by high costs for commercial DEM products, limited temporal and spatial coverage and resolution, or insufficient accuracy.
Sentinel-1 synthetic aperture radar (SAR) data from the European Union's Earth observation programme Copernicus opens the opportunity to leverage free SAR data to generate on-demand multi-temporal topographic datasets. Sentinel-1 A & B data provide a new opportunity to tackle some of the problems related to data costs and spatio-temporal availability. Moreover, the European Space Agency (ESA) guarantees the continuity of the Sentinel-1 mission with the planned launch of another two satellites, i.e., Sentinel-1 C & D. Interferometric SAR (InSAR) approaches based on Sentinel-1 have often been used to detect surface deformation; however, few studies have addressed DEM generation (Braun, 2021). For example, Dabiri et al. (2020) tested Sentinel-1 for landslide volume estimation, but highlighted the need to further research and systematically assess the accuracy of the generated DEMs. InSAR analysis is often conducted using commercial software; however, a well-structured workflow based on free and open-source software (FOSS) increases the applicability and transferability of the DEM generation method. Although a general workflow for DEM generation from Sentinel-1 imagery based on InSAR has been described and documented (ASF DAAC, 2019; Braun, 2020, 2021), there is still a need for improvement, harmonisation and automation of the required steps based on open-source tools.
Within the project SliDEM (Assessing the suitability of DEMs derived from Sentinel-1 for landslide volume estimation), we explore the potential of Sentinel-1 for the generation of multi-temporal DEMs for landslide assessment leveraging FOSS. Relying on the open-source Sentinel Application Platform (SNAP) developed by the ESA, the Statistical-Cost, Network-Flow Algorithm for Phase Unwrapping (SNAPHU) developed by Stanford University, and several other open-source software publicly available for geospatial and geomorphological applications, we work on a semi-automated and transferable workflow bundled in an open-source Python package that is currently under active development. The workflow uses available Python SNAP application programming interfaces (APIs), such as snappy and snapista. We distribute the SliDEM package within a Docker container, which allows its usage along with all its software dependencies in a structured and straightforward way, reducing usability problems related to software versioning and different operating systems. The final package will be released under an open-source license on a public GitHub repository.
The package consists of different modules to 1) query Sentinel-1 image pairs based on perpendicular and temporal baseline thresholds that also match a given geographical and temporal extent; 2) download and archive suitable Sentinel-1 image pairs; 3) produce DEMs using InSAR techniques and perform necessary post-processing such as terrain correction and co-registration; 4) perform DEM differencing of pre- and post-event DEMs to quantify landslide volumes; and 5) assess the accuracy and validate the generated DEMs and volume estimates against reference data. The core module focusses on DEM generation from Sentinel-1 using InSAR techniques available in SNAP. The script co-registers and debursts Sentinel-1 image pairs before generating and filtering an interferogram. Phase unwrapping is performed using SNAPHU. The unwrapped phase is then converted into elevation values, which are finally geometrically corrected and co-registered to a reference DEM. Co-registration is based on assessing the normalised elevation biases over stable terrain (after Nuth and Kääb, 2011).
We assess errors and uncertainties for each step and the quality of the Sentinel-1 derived DEMs using reference data and statistical approaches. The semi-automated workflow allows for the generation of DEMs in an iterative and structured manner, where a systematic evaluation of the resulting DEM quality can be performed by testing the influence of different temporal and perpendicular baselines, the usage of ascending and descending passes, distinct land use/land cover and topography, among other factors. Several major landslides in Austria and Norway have been selected to evaluate and validate the workflow in terms of reliability, performance, reproducibility, and transferability.
The SliDEM workflow represents an important contribution to the field of natural hazard research by developing an open-source, low-cost, transferable, and semi-automated method for DEM generation and landslide volume estimation. From a practical perspective, disaster risk management can benefit from efficient methods that deliver added-value information. From a technical point of view, SliDEM tackles scientific questions on the validity of EO-based methods and the quality of results related to the assessment of geomorphological characteristics of landslides.
The submerged topography of rivers is a crucial variable in fluvial processes and hydrodynamics models. Fluvial bathymetry is traditionally realised through echo sounders embedded on vessels or total stations and GNSS receivers whether the surveyed riverbeds are small streams or dry. Besides being time-consuming and often spatially limited, traditional riverine bathymetry is strongly constrained by currents and deep waters. In such a scenario, remote sensing techniques have progressively complemented traditional bathymetry providing high-resolution information. To date, the peak of innovation for bathymetry has been reached with the use of optical sensors on uncrewed aerial vehicles (UAV) systems, along with green lidars (Vélez-Nicolás et al., 2021). The main obstacle in optical-derived bathymetry is the refraction of the light passing the atmosphere-water interface. The refraction distorts the photogrammetric scene reconstruction, causing in-water measures to be underestimated (i.e., shallower than reality). To correct these distortions, radiometric-based methods are frequently applied. They are focused on the spectral response of the means crossed by the light and are typically built on the theory that the total radiative energy reflected by the water column is function of the water depth (Makboul et al., 2017). The primary goal of the research on submerged topography is to understand the relationship between the water column reflectance and the water depth using statistical and trigonometrical models. The spread of artificial intelligence has given a new light of interest on spectral-based bathymetry by investigating the non-linear and very complex relationship between variables (Mandlburger et al., 2021). To train artificial intelligence models, large amounts of data are usually necessary; therefore, participatory approach and data sharing are required to build statistically-relevant datasets. In this scenario, FOSS tools and distributed resources are mandatory to manage the dataset and allow the replicability of the methodology.
This work aims to test the effectiveness of artificial intelligence to correct water refraction in shallow inland water using very high-resolution images collected by Unmanned Aerial Vehicles (UAV) and processed through a total FOSS workflow. The tests focus on using synthetic information extracted from the visible component of the electromagnetic spectrum. An artificial neural network is created with the data from three different case studies placed in west-north Italy, and geologically and morphologically similar.
The data for the analysis were collected in 2020. Each data collection was realised using a UAV commercial solution (DJI Phantom 4 Pro), and the following datasets were generated: i) RGB georeferenced orthomosaic of the riverbed and banks obtained from photogrammetric process, ii) georeferenced Digital Elevation Model (DEM) of the riverbed obtained from photogrammetric process, iii) GNSS measures of the riverbed and the riverbanks.
The UAV-collected frames were elaborated through a standard structure from motion (SfM) procedure. Visual SfM was employed to align images and the 3D point cloud computation. The digital surface model (DSM) and the orthomosaic production were generated starting from the point cloud in Cloud Compare software. By applying the so-called direct-photogrammetry, the point clouds were directly georeferenced in the WGS84-UTM32 coordinate system thanks to the positioning information retrieved from the embedded GNSS dual-frequency receiver (Chiabrando, Lingua and Piras, 2013). Using the information regarding the camera position and the local height model provided by the national military Geographic Institute (IGM), the ellipsoidal heights were translated into orthometric heights. The GNSS measures had 3 cm accuracy on the vertical component and 1.5cm on the horizontal components.
The RGB information, DSM and seven radiometric indices (i.e., Normalised Difference Turbidity Index; Red and Green Ratio; Red and Blue Ratio; Green and Red Ratio; Green and Blue Ratio; Blue and Red Ratio; Blue and Green Ratio) were calculated and stacked in an 11-bands raster (input raster). The Up component of the bathymetry cross-sections constituted the so-called "Z_GNSS" dataset and is the dependent variable of the regression. The position (Easting, Northing, Up) of each Z-GNSS observation was used to extract the pixel values of each band of the input photogrammetric dataset, including the photogrammetric DEM. The dataset was then normalised and divided into test (20% observations) and training (80% observations) datasets.
In this work, a 5-layer multilayer perceptron (MLP) networks model with three hidden layers was built in Python using the deep learning library Keras with TensorFlow backend (Abadi et al., 2016). The ReLu activation function was added to the ANN layers to bring non-linear properties in the network. The dimension of the input layer is 11, and the weights are initialised to small Gaussian random values (kernel initialiser 'Normal') despite usually skewed or bimodal. A kernel regulizer, L1, was added to reduce the overfitting. The applied optimiser to update weights in the network is the Adaptive Moment Estimation (Adam) search technique, and the loss function, which evaluates the model used by the optimiser to navigate the weights, is the mean absolute error between the predicted output and the target output.
The network was trained on the normalised dataset. The r-squared score, the Mean squared error and the Mean absolute error were computed. Finally, the permutation importance was measured using the eli5 python library.
The neural network regressor performed over 0.80 of r-squared score on the test dataset. As expected, the permutation importance analysis reveals the high impact of the DEM and visible bands, and low importance scores are reported for ratios bands.
The results are satisfying and quite relevant, although the model is the first step through a more complex and deeper neural network to correct water distortions in rivers. It has been trained on a relatively small dataset, but we intend to follow up with the research, add more data, and develop a free and open tool for the scientific community. The present work, provide a good insight about the high reliability and accuracy of artificial intelligence approaches in optical-derived bathymetry.
Open mapping leverages on volunteer mappers mobilized and engaged from the public. volunteers most often are trained and coordinated virtually to carry out dedicated mapping task, irrespective of their geographic location, professional and academic background. In this study volunteer mappers engaged are categorized into two namely: the Local Volunteer Mappers (LVM) comprising of all the potential and actual mappers resident in Nigeria and the Remote Volunteer Mappers (RVM) comprising of all potential and actual mappers not resident in Nigeria.
The study sampled 2 Local Government Areas (LGAs) of River State from the 4 vulnerable oil spill disaster LGAs of Ogoni land communities. Ogoni land is a major oil spill disaster vulnerable area of Nigeria, being the major host communities of crude oil exploitation in the Niger Region of Nigeria. Following the hazardous impact and damage of Ogoni land by oil Spill disaster over the years of oil exploitation in Niger Delta, UNEP assessed that the environmental restoration of Ogoni land would require coordinated efforts on the part of government agencies at all levels, industry operators and communities. UNEP also presented its recommendations as a major opportunity to bring new investment, employment opportunities and a culture of cooperation to Ogoni land in addition to driving improvements in the environmental and health situation on the ground. To effectively implement the UNEP recommendations for restoration of Ogoni land, there is a need for a geographic data that provides critical building footprint in the area, especially, to identify and access the vulnerable oil spill communities. Maps produced would be used by government agencies and other stakeholders working to implement UNEP report on Ogoni land restoration as well sustainable development.
Consequently, the study engaged volunteer mappers to respond to sampled Oil spill communities viz-viz 3 LGAs in Rivers State, Niger Delta Region of Nigeria. To assess the level of participation of Local (mappers in Nigeria) and Remote Mappers (Not Resident in Nigeria), two mapping projects were created in HOT tasking manager for local and remote mappers respectively. For the purpose of campaigning for Volunteer Mappers the 2 project tasks were tagged ‘’ Mapathon Battle for Vulnerable Oil Spill Disaster Communities in Niger Delta’’ respectively. Project task 6358 was created exclusively for remote mappers outside Nigeria to map Tai LGA, while, project task 6359 was created exclusively local mappers resident Nigeria to map Gokana LGA in a Mapathon battle challenge. Project task 6358 had a total grided cells of 825 mapping tasks for online engagement of mappers while project task 6359 had an automated grided cells of 706 mapping tasks due to differences in the size of the area. The Mapathon unveiled the following research results. Engagement of remote mappers for project task 6358-Tai LGA shows that out of the 583 tasks completely mapped, only 13 were yet to be validated after 2 years of creating the project. This is as a result of archiving the project and diversion of attention to urgent tasks. The project recorded a total of about 16,416 edits comprising of 13,552 buildings and 858km of roads mapped in Tai LGA within the timeline of the study. Demographic characteristics of the contributors to project 6358 on the basis of HOT Tasking Manager users by experience and level shows that 50% were advance mappers and 100 % has more than 1 year mapping experience .The project engaged a total of 56 contribtors by mapping and validation. All mappers and validators by experience has used the tasking manager for more than 1 year while their mapping levels ranges between 40% for beginner mapper, 10 % for intermediate and 50 % for advanced mappers. The project timeline as illustrated by the graph shows that mapping and validation of the Tai LGA task commenced on the same date: 6th August,2019 at the rate of 12% mapping and 2 % validation. Mapping progressively ascended to 64% on the 4th day and got to its peak on the 9th day being 15th August with 99% of the entire task mapped. However, validation of the mapping task had a straight curve with the highest peak of validation being the 12th of September with 95% of the task being validated. By 8th January ,2020, being 6th months of the project,100% of the tasks were completely mapped while 13 of the 596 tasks were yet to be validated. The timeline statistics also shows that an average of 20mintes 46 seconds was the time spent per task to map a total of 583 tasks of 16,416 edits. Also, an average of 6minutes 16seconds was spent for validation per task leaving about 1hour 21minutes 29seconds to finish up the validation of 13 tasks left unvalidated due to a shift to other project tasks and less passion for the project under study. However, the analysis of local mappers engaged in HOT Project Task 6359 Gokana LGA also unveiled the following: The study shows that 706 (100%) of the tasks were completely mapped except for validation of 473(67%) tasks which requires further coordination of mappers. There is no record of bad imagery and tasks left unmapped. The project also recorded a total of about 2064 changesets for mapping a total of about 18,367 edits, comprising of 14,983 buildings and 521 km of roads. The project also recoded a total of 173 contributors comprising of 169 mappers and 8 validators. These mappers (100%) had more than 1year experience in online mapping with OpenStreetMap and are categorized into beginner mappers (72%), intermediate (6%) and advance mappers (21%). The entire project timeline by mapping and validation took a period of about 2years 4months(28months) from 6th August 2019 to 27th December ,2021 as at the time of writing this report. Conclusively, there is a lacuna worthy of research investigation in the mapping response level and capability of remote mappers from other countries and local mappers from Nigeria in crowdsourced rapid response mapping using OpenStreetMap.
Origin-destination (OD) datasets provide information on aggregate travel patterns between zones and geographic entities. OD datasets are ‘implicitly geographic’, containing identification codes of the geographic objects from which trips start and end. A common approach to converting OD datasets to geographic entities, for example represented using the simple features standard (Open Geospatial Consortium Inc 2011) and saved in file formats such as GeoPackage and GeoJSON, is to represent each OD record as a straight line between zone centroids. This approach to representing OD datasets on the map has been since at least the 1950s (Boyce and Williams 2015) and is still in use today (e.g. Rae 2009).
Beyond simply visualising aggregate travel patterns, centroid-based geographic desire lines are also used as the basis of many transport modelling processes. The following steps can be used to convert OD datasets into route networks, in a process that can generate nationally scalable results (Morgan and Lovelace 2020):
OD data converted into centroid-based geographic desire lines Calculation of routes for each desire line, with start and end points at zone centroids Aggregation of routes into route networks, with values on each segment representing the total amount of travel (‘flow’) on that part of the network, using functions such as overline() in the open source R package stplanr (Lovelace and Ellison 2018)
This approach is tried and tested. The OD -> desire line -> route -> route network processing pipeline forms the basis of the route network results in the Propensity to Cycle Tool, an open source and publicly available map-based web application for informing strategic cycle network investment, ‘visioning’ and prioritisation (Lovelace et al. 2017; Goodman et al. 2019). However, the approach has some key limitations:
Flows are concentrated on transport network segments leading to zone centroids, creating distortions in the results and preventing the simulation of the diffuse networks that are particularly important for walking and cycling The results are highly dependent on the size and shape of geographic zones used to define OD data The approach is inflexible, providing few options to people who want to use valuable OD datasets in different ways
To overcome these limitations we developed a ‘jittering’ approach to conversion of OD datasets to desire lines that randomly samples points within each zone (Lovelace, Félix, and Carlino Under Review). While that paper discussed the conceptual development of the approach, it omitted key details on its implementation in open source software.
In this paper we outline the implementation of jittering and demonstrate how a single Rust crate can provide the basis of implementations in other languages. Furthermore, we demonstrate how jittering can be used to create more diffuse and accurate estimates of movement at the level of segments (‘flows’) on transport network, in reproducible code-driven workflows and with minimal computational overheads compared with the computationally intensive process of route calculation (‘routing’) or processing large GPS datasets. The overall aim is to describe the jittering approach in technical terms and its implementation in open source software.
Before describing the approach, some definitions are in order:
Origins: locations of trip departure, typically stored as ID codes linking to zones Destinations: trip destinations, also stored as ID codes linking to zones Attributes: the number of trips made between each ‘OD pair’ and additional attributes such as route distance between each OD pair Jittering: The combined process of ‘splitting’ OD pairs representing many trips into multiple ‘sub OD’ pairs (disaggregation) and assigning origins and destinations to multiple unique points within each zone
Jittering represents a comparatively simple — compared with ‘connector’ based methods (Jafari et al. 2015) — approach is to OD data preprocessing. For each OD pair, the jittering approach consists of the following steps for each OD pair (provided it has required inputs of a disaggregation threshold, a single number greater than one, and sub-points from which origin and destination points are located):
Checks if the number of trips (for a given ‘disaggregation key’, e.g. ‘walking’) is greater than the disaggregation threshold. If so, the OD pair is disaggregated. This means being divided into as many pieces (‘sub-OD pairs’) as is needed, with trip counts divided by the number of sub-OD pairs, for the total to be below the disaggregation threshold. For each sub-OD pair (or each original OD pair if no disaggregation took place) origin and destination locations are randomly sampled from sub-points which optionally have weights representing relative probability of trips starting and ending there.
This approach has been implemented efficiently in the Rust crate odjitter, the source code of which can be found at https://github.com/dabreegster/odjitter.
We have found that jittering leads to more spatially diffuse representations of OD datasets than the common approach to desire lines that go from and to zone centroids. We have used the approach to add value to numerous OD datasets for projects based in Ireland, Norway, Portugal, New Zealand and beyond. Although useful for visualising the complex and spatially diffuse reality of travel patterns, we found that the most valuable use of jittering is as a pre-processing stage before routing and route network generation. Route networks generated from jittered desire lines are more diffuse, and potentially more realistic, that centroid-based desire lines.
We also found that the approach, implemented in Rust and with bindings to R and Python (in progress), is fast. Benchmarks show that the approach can ‘jitter’ desire lines representing millions of trips in a major city in less than a minute on consumer hardware.
We also found that the results of jittering depend on the geographic input datasets representing start points and trip attractors, and the use of weights. This highlights the importance of exploring the parameter space for optimal jittered desire line creation.
We plan to create/improve R/Python interfaces to the odjitter and enable others to benefit from it.
We plan to improve the package’s documentation and to test its results, supporting reproducible sustainable transport research worldwide.
In many parts of Burkina Faso, competition over land use has increased tensions and often conflicts between farming and herding communities. Allocating land for farming or grazing is increasingly perceived as a zero-sum calculation among these communities. As a response, the government of Burkina Faso created “Pastoral Zones” across the country as reserves for livestock herders where animals could graze without the risk of entering cropland. Farming in these areas is typically prohibited unless done by herders residing within the reserve. However, farms have appeared in pastoral zones over the years, reducing resources available to herders and exacerbating already fraught tensions between herding and farming communities (Nébie et al 2019). This study uses Sentinel 2 imagery to quantify to what extent agricultural growth is encroaching on two such pastoral zones in Southern Burkina Faso, Niassa and Sondré-Est. This study found a significant growth of agricultural cultivation in both zones between the period of 2016 and 2021.
To map agricultural growth, Sentinel 2 imagery was used in Google Earth Engine (GEE). Reproducibility and accessibility were prioritized, hence the use of a free platform and open EO data was prioritised. Google Earth Engine stood out as an accessible cloud platform to easily access the imagery and run the analysis (Gorelick et al, 2017). To visualise agricultural areas, the “3 Period Timescan” (3PTS) Method was employed. This method uses a series of NDVI Images from the Sentinel 2 satellite throughout a growing season to isolate areas of active cultivation. This product consists of a Red-Green-Blue composite of Sentinel-2 Images where the red band represents the maximum NDVI value during the first period of the growing season, the green the maximum NDVI in the middle, and the blue the maximum NDVI at the end. As a result, the method is able to create a seasonal time-series profile of NDVI. A single NDVI product provides an indication of vegetation presence on a given date, but it is not sufficient to distinguish croplands from other types of vegetation. Croplands are thus identified by their temporal evolution of NDVI values throughout the different phases of the agricultural season: photosynthetic activity of crops is low during the planting period (“beginning of the season”, approximated by 15th June to 1st August), increases during the growing phase (“middle”, 2nd August to 1st September) until reaching a maximum value right before the harvest; once harvested, NDVI values decrease drastically (“end of season”, 2nd September to 15th October). Thus, the approach employed for investigating cropland change considers maximum NDVI values for those three separate subperiods of the agricultural season and aggregates this information into a higher-level product, a RGB color composite so-called 3-Period TimeScan, reflecting the vegetation temporal evolution during the agricultural period, at 10m resolution (Boudinaud and Orenstein, 2021).
3PTS images allow for a user-friendly method to visually identify cropland. Cropland pixels from 3PTS images, when visualized in GEE appear in a dark blue due to the sharp changes from the 2nd and 3rd periods of the time series. This contrasts well with natural vegetation, which has a smoother temporal profile with a noticeable peak in the 2nd period and thus appears greener or a much lighter blue. Forests, due to their high NDVI values throughout the entire growing season appear in white, due to the saturation of all 3 bands. Bare soil, with it’s low NDVI values throughout all 3 periods appears as nearly black pixels.
Rather than machine learning, visual identification was the preferred method of identification due to the relatively small size of each pastoral zone. The time needed to prepare training data and clean the results of a supervised classification would have exceeded the time to manually identify each area of cropland. As a result, once the images were treated by GEE, they were manually traced within QGIS. The 3PTS script, originally made for GEE was then translated to run in PyQGIS. Once run, the script created a raster image for each year’s growing season in the archive (2016-2021) and polygons were traced over each visualised cluster of cropland. The total surface area of all polygons was then calculated for each year. A github repository contains both the PyQGIS and GEE code and can be run with no prerequisites (https://github.com/oren-sa/3PTS).
The results of the study indicate a significant increase in cultivation in both zones between 2016 and 2021. For Sondré Est, this change amounted to 40% and 160% for Niassa.Curiously, the largest increase in cultivation seems to occur between 2016 and 2017. This is especially so for Niassa. Nonetheless, increases in cultivation increased with each passing year until the present year of 2021. A number of these fields are suspected to be encroachments, given their proximity to the border of the zone and that many are contiguous with the agricultural fields outside of the zone’s borders. However, it is estimated that a number of the fields are the result of the zones’ resident herders planting fodder or other cereals. The latter assumption is made based on the location of the fields in question (far from the borders of the reserves) and their proximity to permanent structures in the reserves (habitations, wells or park buildings).
Producing and providing useful information for climate services requires vast volumes of data to come together that further requires technical standards. Beside ordinary base processes for climate data processing like polygon subsetting, there is the special case of extreme climate events and their impacts, where scientific methods for appropriate assessments, detection or even attribution are facing high complexity for the data processing workflows. Therefore the production of climate information services requires optimal science based technical systems, named in this paper climate resilience information systems (CRIS). CRIS like the Climate Data Store (CDS) of the Copernicus Climate Change Service (C3S) are connected to distribute data archives, storing huge amounts of raw data themselves and containing processing services to transform the raw data into usable enhanced information about climate related topics. Ideally this climate information can be requested on demand and is then produced by the CRIS on request by the user. This kind of CRIS can be enhanced when scientific workflows for general climate assessment or even extreme events detection are optimized as information production service, accordingly deployed to be usable by extreme events experts to facilitate their work through a frontend. Deployment into federated data processing systems like CDS requires that scientific methods and their algorithms be wrapped up as technical services following standards of application programming interfaces (API) and, as good practice, even FAIR principles. FAIR principles means to be Findable within federated data distribution architectures, including public catalogs of well documented scientific analytical processes. Remote storage and computation resources should be operationally Accessible to all, including low bandwidth regions and closing digital gaps to ‘Leave No One Behind’. Aggreeing on standards for Data inputs, outputs, and processing API are the necessary conditions to ensure the system is Interoperable. Finally they should be built from Reusable building blocks that can be realized by modular architectures with swappable components, data provenance systems and rich metadata.
General building blocks for climate resilience information systems
A particular focus will be the "roocs" (Remote Operations on Climate Simulations) project, a set of tools and services to provide "data-aware" processing of ESGF (Earth System Grid Federation) and other standards-compliant climate datasets from modelling initiatives such as CMIP6 and CORDEX. One example is ‘Rook’ an implementation of the OGC Web Processing service (WPS) standard, that enables remote operations, such as spatio-temporal subsetting, on climate model data. It exposes all the operations available in the ‘daops’ library based on Xarray. Finch is a WPS-based service for remote climate index calculations, also used for the analytics of ClimateData.ca, that dynamically wraps Xclim, a Python-based high-performance distributed climate index library. Finch automatically builds catalogues of available climate indicators, fetches data using “lazy”-loading, and manages asynchronous requests with Gunicorn and Dask. Raven-WPS provides parallel web access to a dynamically-configurable ‘RAVEN’ hydrological modelling framework with numerous pre-configured hydrological models (GR4J-CN, HBV-EC, HMETS, MOHYSE) and terrain-based analyses. Coupling GeoServer-housed terrain datasets with climate datasets, RAVEN can perform analyses such as hydrological forecasting without requirements of local access to data, installation of binaries, or local computation.
The EO Exploitation Platform Common Architecture (EOEPCA) describes an app-to-the-data paradigm where users select, deploy and run application workflows on remote platforms where the data resides. Following OGC Best Practices for EO Application Packages, Weaver executes workflows that chain together various applications and WPS inputs/outputs. It can also deploy near-to-data applications using Common Workflow Language (CWL) application definitions. Weaver was developed especially with climate services use cases in mind.
Case of AI for extreme events investigations
Here we present challenges and preliminary prototypes for services which are based on OGC API standards for processing (https://ogcapi.ogc.org/processes/) and implementation of Artificial Intelligence (AI) solutions. We will presenting blueprints on how AI-based scientific workflows can be ingested into climate resilience information systems to enhance climate services related to extreme weather and impact events. The importance of API standards will be pointed out to ensure reliable data processing in federated spatial data infrastructures. Examples will be taken from the EU Horizon2020 Climate Intelligence (CLINT; https://climateintelligence.eu/) project, where extreme events components could optionally be deployed in C3S. Within this project, appropriate technical services will be developed as building blocks ready to deploy into digital data infrastructures like C3S but also European Science Cloud, or the DIAS. This deployment flexibility results out of the standard compliance and FAIR principles. In particular, a service employing state-of-the-art deep learning based inpainting technology to reconstruct missing climate information of global temperature patterns will be developed. This OGC-standard based web processing service (WPS) will be used as a prototype and extended in the future to other climate variables. Developments focus on heatwaves and warm nights, extreme droughts, tropical cyclones and compound and concurrent events, including their impacts, whilst the concepts are targeting generalized opportunities to transfer any kind of scientific workflow to a technical service underpinning scientific climate service. The blueprints take into account how to chain the data processing from data search and fetch, event index definition and detection as well as identifying the drivers responsible for the intensity of the extreme event to construct storylines.
Generalization is one of the fundamentals of scientific research. In the context of spatial information, generalization needs to allow for finding common properties but also for spatial contiguity. Therefore, such generalization is often made through regionalization - partitioning of space into spatial clusters or regions. This process is vital for environmental studies, where many patterns and processes are autocorrelated spatially. Examples of regionalizations include delineation of ecoregions, detection of homogeneous zones for precision agriculture, definition of climate regions, and so on.
Traditionally spatial generalization was performed manually, often based on a compilation of pre-existing, independently conducted studies. This approach lack of quantitative framework, and thus no systematic checks, modifications or objective updates are possible. Currently, the abundance of remote sensing spatial data, such as satellite imagery, gridded climate data, or land cover maps, allows fast extraction of relevant spatial information on regional and global scales, making possible studies rooted in a clear quantitative framework.
Such data, however, still requires spatially-aware generalization to formulate general concepts or claims. Remote sensing data stores information as a set of raster cells, where a single cell is unaware of its spatial context. This is often not enough to understand underlying objects or processes.
(Geographic) object-based image analysis (OBIA) (Blaschke 2010) is frequently applied to resolve this issue. It is an approach to partition space consisting of raster cells into homogeneous objects and thus make spatial regionalization possible. Several generalization techniques were developed for OBIA, including a superpixels approach that proved to perform best for image processing and remote sensing data analysis (Csillik 2017).
The main idea of superpixels is to create connected groupings of cells with similar values (Ren and Malik 2003; Achanta et al. 2012). Each superpixel represents a desired level of homogeneity while at the same time maintaining spatial structures. Superpixels also carry more information than each cell alone, and thus they can speed up the subsequent processing efforts (Ren and Malik 2003; Achanta et al. 2012).
The original superpixels algorithm has, however, two major drawbacks for spatial data problems other than RGB images. Firstly, the algorithm uses the Euclidean distance, which is adequate in many cases, such as RGB images. However, it limits the possible usability for environmental datasets – Euclidean distance is not suitable for many types of spatial raster data (e.g., categorical rasters) and has undesirable properties for multi-dimensional data (e.g., a set of monthly climate data), where the results based on Euclidean distance contradict human intuition (Aggarwal, Hinneburg, and Keim 2001). Secondly, the superpixels technique does not result in regions per se but rather over-segmentation – some spatial objects/regions could be represented by one superpixel, while others could consist of many very similar superpixels.
Our preliminary results presented during the GIScience 2021 conference (Nowosad and Stepinski 2021) provide a basis for using other distance measures to create superpixels. The proposed extension can also be used for various scenarios, such as creating regions of similar multi-dimensional spatial and temporal patterns or similarly ranked areas. The extension is also already available as an open-source software in the form of an R package. The supercells package has extensive documentation in the form of a help file and additional vignettes that can be found, together with its installation instructions, at https://jakubnowosad.com/supercells/.
The second issue is, however, still not resolved. Many clustering methods exist that could be used for merging similar connected superpixels, including traditional ones such as hierarchical clustering and spatial-aware ones such as SKATER or REDCAP. Wang et al. (2018) developed a REDCAP-based workflow for merging superpixels, which showed good image results and outperformed similar techniques; however, their work was based on the original superpixels algorithm and thus used Euclidean distance on 3-dimensional RGB images only. Additionally, it could be worth testing how good modern unsupervised machine learning techniques would perform in this task.
Our main goal is to present the work in progress related to developing a robust method for merging superpixels and thus creating high-quality regionalization. We will test clustering/grouping methods based on three main criteria: accuracy, universality, and computational performance. Accuracy will be obtained based on the resulting regions’ internal homogeneity and their isolation compared to the neighbors. Universality will be tested on several datasets to check if the method works for various scenarios, including RGB images, categorical rasters, spatial time-series, etc. The computational performance will be evaluated based on the time needed for each method’s calculation and their use of computer resources.
Achanta, R., A. Shaji, et al. 2012. “SLIC Superpixels Compared to State-of-the-Art Superpixel Methods.” IEEE Transactions on Pattern Analysis and Machine Intelligence 34 (11): 2274–82. https://doi.org/f39g5f.
Aggarwal, Charu C., Alexander Hinneburg, et al. 2001. “On the Surprising Behavior of Distance Metrics in High Dimensional Space.” In Database Theory — ICDT 2001, edited by Jan Van den Bussche and Victor Vianu, 1973:420–34. Lecture Notes in Computer Science. Springer Berlin Heidelberg. https://doi.org/10.1007/3-540-44503-X_27.
Blaschke, T. 2010. “Object Based Image Analysis for Remote Sensing.” ISPRS Journal of Photogrammetry and Remote Sensing 65 (1): 2–16. https://doi.org/d4ksqf.
Csillik, Ovidiu. 2017. “Fast Segmentation and Classification of Very High Resolution Remote Sensing Data Using SLIC Superpixels.” Remote Sensing 9 (3): 243. https://doi.org/f92zgd.
Nowosad, J., and T. Stepinski. 2021. “Generalizing the Simple Linear Iterative Clustering (SLIC) Superpixels.” GIScience 2021 Short Paper Proceedings. 11th International Conference on Geographic Information Science. September 27-30 2021. Poznań: Poland (Online). https://doi.org/gnw982.
Ren, and Malik. 2003. “Learning a Classification Model for Segmentation.” In Proceedings Ninth IEEE International Conference on Computer Vision, 10–17 vol.1. Nice, France: IEEE. https://doi.org/c6s237.
Wang, Mi, Zhipeng Dong, et al. 2018. “Optimal Segmentation of High-Resolution Remote Sensing Image by Combining Superpixels With the Minimum Spanning Tree.” IEEE Transactions on Geoscience and Remote Sensing 56 (1): 228–38. https://doi.org/gct8gv.
Urban sprawl is associated with negative environmental impacts such as the loss of habitat and the loss of most fertile soils for agriculture. The hinterland of Cologne, Germany is facing these challenges. The area is expected to face a population increase by 200,000 inhabitants in the next twenty years. Given past development trends, this population increase will have to be mainly absorbed by the cities and villages in the hinterland. While this provides ample economic opportunities, negative impacts on ecosystems as well as on agriculture have to be assumed due to urban sprawl and increasing fragmentation. The region is known as as one of the most productive agricultural regions in Central Europe. As highest fertile soils are located in the direct neighborhood of existing settlements, urban sprawl will lead to strong trade-offs with agricultural production.
The aim of the scientific project NACHWUCHS is to identify alternatives to the continuation of existing development patterns. Therefore, we developed a baseline land use model and compare it to scenarios that assume different brownfield development activities. Stakeholder involvement is at the core of the project, as policies for alternative pathways cannot be successfully implemented without the support by farmers, real estate companies, environmental stakeholder , the municipalities and the district administration. The most important aspect of land use change in the region is the allocation of new housing areas. This is modeled by a tool-chain based on a free software stack, that uses PostgresSQL with a Postgis extention, Python and QGIS. The allocation model for new housing areas is currently based on a random forest classifier that has been trained on the official governmental ATKIS vector land use data set. The predictors of the model included distance to public transport and social infrastructure as well as existing land use development plans. The allocation of new housing areas was limited to areas outside of protected areas. Furthermore, only a few land use classes – mainly agriculture – were allowed for the allocation of new housing areas. The distance-based predictors were calculated by the openrouteservice, which uses OpenStreetMap data to build the routing graph and to assign routing weights.
A 100 by 100m vector grid was used for model training and prediction. Model performance was evaluated based on a split in test and training data that considered spatial relationships. Based on the suitability of the grid cells the demand for projected new housing areas was allocated. We used nine scenarios that differed in the building density for new housing areas as well as by the extent of brownfield development . In the study presented, building density is expressed in residential units per hectare. Residential units per hectare is simplified as the number of flats in a building. In the simulated scenarios, three density classes (10, 30 and 50 residential units per hectare) and three different proportions of brownfield development (10, 20 and 40 per cent) were combined. In the simulated period from 2018 to 2040, we had an area increase of more than fifty percent between the scenario with the lowest density and the lowest proportion of brownfield development and the scenario with the highest density and the highest proportion of brownfield development . The results of the allocation procedure was evaluated based on a set of indicators which cover environmental, agricultural and social aspects. Examples are the supply of agriculture related ecosystem services, soil fertility, economic value of agricultural production and hemeroby.We used the Open Data of the State of North Rhine-Westphalia, which contained geodata for the relevant domains economy, environment and nature conservation, agriculture, social affairs and transport. The data are Inspire-compliant and available under a free licence (DL-DE->Zero-2.0) . The data set further allowed the evaluation of the model results with regard to the consequences of the flood disaster of the 14th July 2021, which severely affected parts of the hinterland of Cologne.
Our results will be used in the context of a mission statement for the future regional development, developed together with locals stakeholders. The mission statement defined development goals for four sub-regions derived by socio-economic and environmental properties based on 17 UN SDGs. With the help of the above-mentioned indicators, we will evaluate how close or how far the results of the different scenarios are to these goals and assist local stakeholders, e.g. in the search for locations of new residential areas. A transfer of the model to regions with similar settings is possible as long as suitable data is available for retraining the model and for the estimation of the indicator sets, highlighting again the importance of open data. The ATKIS data used is openly available for some of the federal states of Germany but not beyond. For North-Rhine Westphalia a transfer semms reasonable- Test runs based on the CORINE land use / land cover product lead to comparable results, indicating that this might be a suitable replacement for the ATKIS based land use information.The Python code of the model, the necessary scripts to generate the required postgisdatabase, a QGIS project example for the visualisation of the results as well as a set of training and test data are provided under free licence via a Gitlab repository.
Slope stability is strongly influenced by soil hydraulic conditions, affected by the meteoric events to which the site is subject. With particular reference to shallow landslides triggered by rainfalls, the stability conditions can be influenced by the propagation of the saturation front inside the unsaturated zone. The soil shear strength varies in the vadose zone depending on the type of soil and the variations of soil moisture. In general, monitoring of the unsaturated zone can be done by measuring suction and/or water content.
The measurement of the volumetric water content can be performed using low-cost instrumentation, such as the Waterscout SM100 capacitive sensors (Spectrum Tec.), distributed over the study areas. Such sensors provide data in near-real time and are relatively easy to install and replace. However, it is essential to perform a site-specific calibration of the instrumentation, since previous work (Bovolenta et al. 2020) has shown that the factory settings lead to a general overestimation of the actual volumetric soil water content. Therefore, following a sampling of the analyzed soil and a specific laboratory procedure, it is necessary to define the calibration curve that allows the transition from raw data, meant as the ratio between sensor output voltage and input voltage, to soil water content.
Then, the knowledge of soil water content allows the estimation of the suction parameter, thanks to a Water Retention Curve (WRC), and consequently the definition of the soil shear strength in partly saturated conditions.
Several methodologies for landslide susceptibility assessment, based on global Limit Equilibrium (LEM) or Finite Element (FEM) methods, need the soil shear strength description in order to evaluate the slope stability conditions. Both in the recent literature (Escobar-Wolf et al. 2020, Moresi et al. 2020) and in the GRASS GIS software (r.shalstab), models are already proposed for shallow landslide susceptibility estimation in GIS, based mainly on LEM. However, these models do not usually consider the unsaturated soil behaviour, but at most take into account the strength contribution provided by the vegetation root systems.
The present contribution describes the implementation of an automatic procedure in GRASS GIS that, starting from monitoring data related to the soil volumetric water content, provides a 3D description of the soil shear strength in the vadose zone, that is essential for the subsequent landslide susceptibility assessment, especially in the case of shallow landslides.
Soil moisture sensors data come from five monitoring networks that were set up between 2019 and 2021 in the framework of the Interreg Alcotra AD-VITAM project. Each network was organized into measurement nodes (from three to five) instrumented with four soil moisture sensors each and communicating via radio with a receiver. The receiver was then connected to a modem for remote data transmission. The four sensors in each node have been placed in the soil at four different depths (-15, -35, -55, -85 cm from the ground level). The monitoring systems allow to obtain data with a minimum frequency of 5 minutes, in .csv format so that can feed a geodatabase.
Starting from a properly storing of data recorded by the monitoring network in a geodatabase, at the moment within GRASS GIS but in the near future in PostGIS, the equation of the site-specific sensor calibration, defined in laboratory, and the equation of the WRC are implemented in a procedure that allows to pass automatically from the raw sensor data to the soil water content, and then to the evaluate the suction parameter. Hence, the soil strength can be estimated for each depth at which a soil moisture sensor is installed. Moreover, since the study area is often in the order of few square kilometers, the information must be spatialized over the entire area of interest, through appropriate techniques of interpolation and extrapolation.
This procedure could be integrated into a LEM or FEM, including the above cited, taking advantage of the soil moisture measurements to improve the evaluation of the stability conditions over time, by analysing the evolution of the saturation front according to the weather conditions.
The authors, in particular, will integrate it into a system called LAMP (LAndslide Monitoring and Predicting), which has been under development for several years through the implementation in a GIS environment of an Integrated Hydrological-Geotechnical (IHG) 3D model for the assessment of landslide risk triggered by measured or forecasted precipitation. The integration of this procedure in LAMP will allow to obtain a simple but effective modelling for the assessment of susceptibility to shallow landslides, too.
Note that the contribution in the landslide risk management of the present procedure could be important even in the days following the rainfall event of interest, providing the technical staff in charge of territorial protection with a useful tool for the landslide susceptibility assessment, especially in the case of shallow landslides.
In order to allow the scientific community to evaluate the usefulness of the proposed procedure and consequently to have the possibility to implement it in the above-mentioned methods (LEM-FEM) improving the assessment of landslide susceptibility, soil moisture data at a specific site, related to significant rainfall events, and the implemented procedure will be openly shared, once the testing phase is completed.
The geomatic strategy for the survey campaign, data processing and product fruition in an archaeological context is presented and discussed. The case study is the Domus V situated in the Archaeological Park of Pompeii (Regio VII, Insula 14), which was surveyed in September 2020 by the Geomatics Laboratory of Genoa University in collaboration with the archaeologist group of the same University, under the ministerial concession DG 553 Class 34.31.07/246.7 of 26 January 2016 and its renewal on 9 April 2019 (34.31.07/3.4.7/2018).
The survey campaign involved the following integrated geomatic techniques:
- UAV photogrammetry, performed with DJI Mavic 2 Pro. The shooting geometry was nadiral with two different altitudes of 40 m and 15 m. An additional survey with a tilting angle of 45° at a flight altitude of 15 m was performed along concentric paths around the site. The UAV dataset is composed of 1400 images. The photogrammetric surveys are framed thanks to temporary Ground Control Points (GCPs), surveyed with GNSS in Network Real Time Kinematic (NRTK) positioning strategy.
- Terrestrial photogrammetry, 7000 images of the internal vertical walls were taken with a Canon Eos 40D camera at a shooting distance of about 2 m following a bottom-to-top trajectory.
- Terrestrial laser scanning, using the Z+F 5006h phase difference instrument.
The integrated survey allowed to move from a general view of the entire site to an increasingly detailed one, mainly aimed at the vertical walls, thanks to the global framing provided by the UAV survey.
The UAV and terrestrial photogrammetry campaigns were processed through the open-source software MicMac  to create the dense point clouds, and CloudCompare  to align the different blocks.
MicMac was chosen for its open-sourceness and its rigorousness in the photogrammetric processing, both related to the estimation of the external/internal orientation parameters and the dense matching to obtain the 3D point clouds from the images, that is based on a multi-scale, multi-resolution pyramidal approach that minimizes the outliers and the noise.
Due to the not linear computational time in respect of the number of images, the MicMac processing was split in blocks of 500 images each (about 24 hours of processing time), with 100 overlapping images between two consecutive blocks, to align them through a point-to-point strategy. The obtained 3D point cloud was oriented and scaled using 15 natural points found on the terrestrial laser scanner point cloud, obtaining deviations on points positions ranging between 1 and 2 cm. The quality of the alignment was tested computing the distance between the laser scanner and the photogrammetric point clouds using CloudCompare M3C2 algorithm  on a representative area of 1.60 m × 2.25 m of the fresco on the central wall of the surveyed room, obtaining distances of ± 5 mm orthogonally to the wall.
Moreover, the software MAGO , developed in C++ environment within the Geomatics Laboratory, was used to produce high-resolution orthophotos of vertical walls. MAGO exploits a step-by-step self-adaptive mesh that fits the dense point clouds considering a triangular plane area, where the image pixel is projected at its original resolution via the collinearity equations. The needed inputs are the image(s) to be orthorectified, the external and internal orientation parameters, the user-defined orthophoto plane and the output orthophoto resolution. MAGO was recently updated to generate orthophotos of non-coplanar adjacent walls, i.e., forming an edge between them, through a rotation so that the two walls are in a continuous common plane.
The orthophotos were made accessible and viewable via a QGIS  project built so to manage two different reference frames, i.e, the traditional planimetric plane (X,Y) and the vertical plane of the walls (X-Y,Z), where the X-Y represent the planimetric coordinates along the wall direction. This allows to introduce the third dimension in the typical GIS representation, thus realizing a 3D GIS environment. The QGIS project is organized with a “master-slave” architecture, where the master project is dedicated to the (X,Y) plane and reports the vectorial geometries (lines) representing the perimeter of the walls, whereas a different slave project is dedicated to each specific wall with the corresponding orthophoto in a (X-Y,Z) plane. Each slave project is connected to the master thanks to a QGIS action that opens it when clicking on the corresponding wall in the master project. In each sub-project, the orthophoto of the wall is displayed together with three default shapefiles: point, line and polygon shapefile, respectively. The attribute tables of the three shapefiles are set to automatically be updated with the following information once the user introduces a new geometry:
- point shapefile: the image coordinates (x, y) in pixel units and in the corresponding object coordinates (E, N, Z), where E and N represent the east and north coordinates in ETRF2000-2008.0/UTM33N reference frame and Z is the height of the point on the wall;
- line shapefile: length of the drawn line in meters;
- polygon shapefile: length of the perimeter and polygon surface, in meters and square meters, respectively.
An additional feature of the QGIS project is the possibility of performing the orthophoto classification based on the state of conservation of the wall, i.e., crumbling, degraded, good conditions, preserved, through user-defined training areas, from which the spectral signatures to be used in the supervised classification are computed.
Thanks to this "nested GIS" environment, the ensemble of the produced orthophotos can be viewed and linked to the corresponding geometry, forming a catalogue for an overall analysis of the entire archaeological site, taking advantage of an increasingly detailed and precise zooming in the areas of interest. This environment can also be used by non-expert geomatics users, making the survey products available for analysis in different specific disciplines.