Geodaysit 2023

Luca Candeloro


Earth Observation Data and Extreme Gradient Boosting Model: innovative methods predicting West Nile Virus Circulation in Italy
Carla Ippoliti, Luca Candeloro, Susanna Tora, Federica Iapaolo, Federica Monaco, Daniela Morelli, Annamaria Conte

West Nile Disease (WND) is one of the most spread zoonosis in Italy and Europe caused by a vector-borne virus. In Italy, the surveillance for WN and USUTU viruses is focused to early detect the virus circulation in a territory: it involves equids, wild and resident birds and mosquitoes.
In the Italian ecosystem, peak transmission of WNV to humans typically occurs between July and September, coinciding with the summer season when mosquitoes are most active and temperatures are highest. To early detect WNV circulation and therefore to reduce the risk of transmission to humans, wild birds, corvids, poultry, horses, and mosquitoes are sampled according to a risk-based ranking of the Italian provinces and WNV infection are confirmed. Together with field activities it is important to identify suitable climatic and environmental conditions for the vectors and virus to spread. The recent and massive availability of Earth Observation (EO) data and the continuous development of innovative Machine Learning methods can contribute to automatically identify patterns in big datasets and to make highly accurate identification of areas at risk.
In this study, the veterinary cases notified in the epidemics 2017-2020 were collected from the National Information System for Animal Disease Notification (SIMAN) and associated to climatic and environmental variables. EO data were derived from different sources, downloaded, mosaicked, converted to degrees (for temperature), pre-processed and harmonised: Land Surface Temperature (LST) Daytime and LST Night-time were derived from the product NASA-MODIS MOD11A2 (8-days temporal resolution, 250 meters spatial resolution); Normalized Difference Vegetation Index (NDVI) dataset was derived from the product NASA-MODIS MOD13Q1 (MODIS/Terra Vegetation Indices 16-Day L3 Global 250 m); the Surface Soil Moisture (SSM) was derived from Copernicus - Daily SSM 1-km V1 product. Each eight consecutive images of SSM have been merged to have a unique raster covering the whole Italy, for a total of 46 images per year. We have then applied a gap filling procedure to replace the empty pixels in the datasets, as the presence of missing values can prevent an accurate and homogeneous (in space and time) prediction. The three EO datasets have been resampled at the highest available spatial resolution (250 m) using bilinear interpolation method, and each dataset has maintained its own temporal scale (NDVI: 16 days; LSTD, LSTN and SSM: 8 days).
Applying a raster-based approach with a time window of 16 days, we investigated the WN virus circulation in relation to the EO variables collected during the 160 days before the infection took place, with the aim of evaluating the predictive capacity of lagged remotely sensed variables in the identification of areas at risk for WNV circulation in Italy.

An Extreme Gradient Boosting model was trained with data from 2017, 2018 and 2019 and tested for the 2020 epidemic, predicting the spatio-temporal WNV circulation two weeks in advance with an overall accuracy of 0.86 (sensitivity= 0.79, Specificity = 0.91, AUC = 0.94).
This work lays the basis for an early warning system (16-days ahead) that alert public authorities when climatic and environmental conditions become favourable to the onset and spread of WNV. This knowledge can be used to define intervention priorities within national surveillance plans.

AIT Contribution
Sala Videoconferenza @ PoliBa