FOSS4G 2022 academic track

Nicolas Roelandt

Nicolas Roelandt is a GIS engineer at the Gustave Eiffel University
He is active in the OSGeo-fr chapter board, the OSGeoLive PSC and he is a FOSS4G enthusiast.


Sessions

08-24
16:45
30min
Crowdsourced acoustic open data analysis with FOSS4G tools
Nicolas Roelandt

Introduction

NoiseCapture is an Android application developed by the Gustave Eiffel University
and the CNRS as part of a participatory approach to environmental noise mapping.
The application is open-source and all its data are free.

The study presented here is a first analysis of the first three years of data
collection, through the prism of noise sources. The analysis only focused on the
labels filled in by the users and not on the sound spectrum of the measurement,
which will be studied later.

The aim was to determine whether known dynamics in environmental acoustics could
be recovered using collaborative data.

This preparatory work having to be consolidated and extended thereafter, and with
the will to include this study within the framework of the Open Science, an
attention was brought on the reproducibility aspect of the analysis.
This one was entirely realized with free software and literate programming techniques.

The context of the study, the tools and techniques used and the first results
obtained will be presented as well as the benefits of using literate programming
in this type of preparatory work.

Data

An article presenting this dataset was published in 2021 (Picaut et al. 2021).
It details the structure of the database and the data, the profile of the
contributors and the contributions but does not analyze the content of the data.
This is what this article proposes to begin.

The data used in this study correspond to contributions made between August 29, 2017
and August 28, 2020. During this period, nearly 70,000 unique contributors allowed
the collection of more than 260,000 tracks for a total of about 60 million seconds
of measurement. A trace is a collected recording, it contains the sound spectrum
(1 second, third octave) recorded by the phone coupled with its GPS positioning
(1 second). This information can be enriched by the contributor with labels.
There are 18 labels and the user can select one or more of them for each of the
traces made. They are detailed in (Picaut et al. 2021).
The preliminary work presented here focuses on the analysis of the proportion of
certain labels in the global sample at certain temporalities.

In addition to data from the collaborative collection, some additional data were
used to limit the study area. We chose to limit the geographical scope of this
preliminary study to metropolitan France because this area contains the largest
number of recordings.
The climate and sound dynamics are known and documented there.

To facilitate the reproducibility of spatial filtering, it was decided to use
open data sets from recognized sources: the Natural Earth database
(Patterson and Kelso 2021) and the Admin Express database from the
National Institute of Geographic and Forest Information (Institut Géographique National 2021).

The study

Tools

PostGIS

The data are provided as a dump from a PostGreSQL/PostGIS database (Ramsey and Blasby 2001).
Several scripts perform much of the attribute and spatial filtering.
These filterings are saved in a materialized view whose data will be analyzed
with the R language.

R

The R language (R Core Team 2021)
is a programming language for data processing and statistics with many libraries
dedicated to geospatial data.
Rmarkdown allows to mix code and text in markdown for the dynamic production of
graphs, tables and documents.
It is one of the recommended means for literate programming.

Git

Git is a Distributed Version Control System (DVCS) (Chacon and Straub 2014).
It enables collaborative and decentralized work.
The choice of Git was natural as different collaborators are present on several
sites (Nantes, Lyon, Paris) and Git is already used within the UMRAE laboratory.

Implementation

The data are provided in the form of a PostGreSQL/PostGIS dump.
A server has been set up and the data loaded.
A materialized view was created in order to provide a stable access to the data
corresponding to the defined criteria.
These criteria are both attributive (filtering of certain tags, minimum and maximum
durations, etc.) and spatial (located in France, reduced trace area, etc.).
A Rmarkdown document establishes the connection with the view and then performs
the operations allowing to analyze the data.

A document mixing narrative, figures and code allowed the resumption and
continuation of the analyses shown here.

Results

The study concerns tracks bearing a tag, registered in metropolitan France.
It focuses on the proportion of a certain tag in relation to all the tags for a
given period (time of day, season, etc.).
In the sample studied, it is possible to note a prevalence of the tags roads,
chatting, animals and wind. The tags air_traffic and works are also well represented.

A first axis of analysis concerns the time distribution of the tags.
Animal noises (tag animals) are more frequent in the morning and especially
one hour before sunrise.
This is a common dynamic for bird song.
We also observed peaks in human activity, especially commuting.

The next temporal axis was the seasonality, especially those of animal noises,
with a more intense activity in European spring and summer.
This phenomenon could also be observed in the recordings.
We also noticed that music was less present in autumn than in other seasons and
that it is mostly present at late hours.

Conclusion

The first results are encouraging because road dynamics related to commuting or
animal activity can be observed.
The main question was to determine if these known dynamics in environmental acoustics
can be observed in a crowdsourced dataset.
The first elements seem to answer positively to this question.

Some questions still need to be explored, notably those concerning the
representativeness of samples that are sometimes weak for certain time periods.

The systematic use of open source software, the provision of documented code files
and a document mixing narrative, figures and code have allowed the resumption and
continuation of the analyses shown here.
This work in progress will complete the final article.

Room Modulo 3