BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//pretalx//talks.osgeo.org//foss4g-2022-academic-track//SMRLLD
BEGIN:VTIMEZONE
TZID:CET
BEGIN:STANDARD
DTSTART:20001029T040000
RRULE:FREQ=YEARLY;BYDAY=-1SU;BYMONTH=10
TZNAME:CET
TZOFFSETFROM:+0200
TZOFFSETTO:+0100
END:STANDARD
BEGIN:DAYLIGHT
DTSTART:20000326T030000
RRULE:FREQ=YEARLY;BYDAY=-1SU;BYMONTH=3
TZNAME:CEST
TZOFFSETFROM:+0100
TZOFFSETTO:+0200
END:DAYLIGHT
END:VTIMEZONE
BEGIN:VEVENT
UID:pretalx-foss4g-2022-academic-track-VMNCM3@talks.osgeo.org
DTSTART;TZID=CET:20220825T141500
DTEND;TZID=CET:20220825T144500
DESCRIPTION:Motivation:\n\nBecause of technological advancements\, public p
 articipation in scientific projects\, known as citizen science\, has grown
  significantly in recent years (Schade and Tsinaraki 2016\; Land-Zandstra 
 et al. 2016). Contributors to citizen science projects are very diverse\, 
 coming from a variety of expertise\, age groups\, cultures\, and so on\, a
 nd thus the data contributed by them should be validated before being used
  in any scientific analysis. Experts typically validate data in citizen sc
 ience\, but this is a time-consuming process. One disadvantage of this is 
 that volunteers will not receive feedback on their contributions and may b
 ecome demotivated to continue contributing in the future. Therefore\, a me
 thod for (semi)-automating validation of citizen science data is critical.
  One way that researchers are now focusing on is the use of machine learni
 ng (ML) algorithms to validate citizen science data.\n\nMethodology:\n\nWe
  developed a citizen science project with the goal of collecting and autom
 atically validating biodiversity observations while also providing partici
 pants with real-time feedback. We implemented the application with the Dja
 ngo framework and a PostgreSQL/PostGIS database for data preservation. In 
 general\, the focus of biodiversity citizen science applications is on aut
 omatically identifying or validating species images\, with less emphasis o
 n automatically validating the location of observations. Our application's
  focus\, aside from image and date validation (Lotfian et al. July 15-20\,
  2019)\, is on automatically validating the location of biodiversity obser
 vations based on the environmental variables surrounding the observation p
 oint. In this project\, we generated species distribution models using var
 ious machine learning algorithms (Random Forest\, Balanced Random Forest\,
  Deep Neural Network\, and Naive Bayesian) and used the models to validate
  the location of a newly added observation. After comparing the performanc
 e of the various algorithms\, we chose the one with the best performance t
 o use in our real-time location validation application.\n\nWe developed an
  API that validates new observations using the trained models of the chose
 n algorithm. The Flask framework was used to create the API. The API uses 
 the location and species name as parameters to predict the likelihood of o
 bserving a species (for the time being\, a bird species) in a given neighb
 orhood. Moreover\, the model prediction\, as well as information on specie
 s habitat characteristics are then communicated to participants in the for
 m of real-time feedback. The API has three endpoints: a POST request that 
 takes the species name and location of observation and returns the model p
 rediction for the probability of observing the species in a 1km neighborho
 od around the location of observation\; a GET request that takes the locat
 ion of observations and returns the top five species likely to be observed
  in a 1km neighborhood around the location of observation\; and a GET requ
 est that returns the species common names in English.\n\n\nUser experiment
 :\n\nA user experiment was carried out to investigate the impact of automa
 tic feedback on simplifying the validation task and improving data quality
 \, as well as the impact of real-time feedback on sustaining participation
 . Furthermore\, a questionnaire was distributed to volunteers\, who were a
 sked about their feedback on the application interface as well as the impa
 ct of real-time feedback on their motivation to continue contributing to t
 he application.\n\nResults:\n\nThe results were divided into two parts: fi
 rst\, the performance of the machine learning algorithms and their compari
 son\, and second\, the results of testing the application through the user
  experiment.\n\nWe used the AUC metric to compare the performance of the m
 achine learning algorithms\, and the results showed that while DNN had a h
 igher median AUC (0.86) than the other three algorithms\, DNN performance 
 was very poor for some species (below 0.6). Balanced Random Forest (AUC me
 dian 0.82) performed relatively better for all species in comparison to th
 e other three algorithms. Furthermore\, for some species where the other t
 hree algorithms performed poorly (AUC less than 70%)\, Balanced-RF outperf
 orms the others.\n\nThe user experiment results provided us with prelimina
 ry findings that support the combination of citizen science and machine le
 arning. According to the findings of the user experiment\, participants wi
 th a higher number of contributions found real-time feedback to be more us
 eful in learning about biodiversity and stated that it increased their mot
 ivation to contribute to the project. Besides that\, as a result of automa
 tic data validation\, only 10% of observations were flagged for expert ver
 ification\, resulting in a faster validation process and improved data qua
 lity by combining human and machine power. \n\n\nWhy it should be consider
 ed:\n\nData validation and long-term participation have always been two of
  the most difficult challenges in citizen science and VGI (volunteer geogr
 aphic information) projects. Various studies have been conducted on biodiv
 ersity data validation\, focusing primarily on observation images with aut
 omatic species identification\; however\, not enough attention has been pa
 id to observation location validation\, particularly automatic location va
 lidation taking into account species habitat characteristics. Furthermore\
 , to the best of our knowledge\, the combination of machine learning and c
 itizen science for sustaining participation by providing real-time user-ce
 ntered and machine generated feedback to participants has received\, till 
 now\, little attention and therefore our work is new\, original and comple
 tely coherent with the vision of community citizen science\, where scienti
 sts and citizen scientists are supposed to learn from each other. \n\nBibl
 iography:\n\n\nLand-Zandstra\, Anne M.\, Jeroen L. A. Devilee\, Frans Snik
 \, Franka Buurmeijer\, and Jos M. van den Broek. 2016. “Citizen Science 
 on a Smartphone: Participants’ Motivations and Learning.” Public Under
 standing of Science  25 (1): 45–60.\n\nLotfian\, Maryam\, Jens Ingensand
 \, Olivier Ertz\, Simon Oulevay\, and Thibaud Chassin. July 15-20\, 2019. 
 “Auto-Filtering Validation in Citizen Science Biodiversity Monitoring: A
  Case Study.” In Proceedings of the 29th ICA Conference. Vol. 2. https:/
 /doi.org/10.5194/ica-proc-2-78-2019.\n\nSchade S\, Tsinaraki C.\; Survey r
 eport: data management in Citizen Science projects\; EUR 27920 EN\; Luxemb
 ourg (Luxembourg): Publications Office of the European Union\; 2016\; doi:
 10.2788/539115
DTSTAMP:20260422T234410Z
LOCATION:Room Hall 3A
SUMMARY:An approach for real-time validation of the location of biodiversit
 y observations contributed in a citizen science project - Maryam Lotfian
URL:https://talks.osgeo.org/foss4g-2022-academic-track/talk/VMNCM3/
END:VEVENT
END:VCALENDAR
