Democratising earth observation data: co-creating localised national-scale machine learning classification models through country-driven field surveys and Digital Earth Pacific.
11-20, 09:00–09:25 (Pacific/Auckland), WA220

This paper highlights the datasets, analytical tools, computational capacity and insights made possible through Digital Earth Pacific (DE Pacific). The paper focuses on the use case of participatory land use land cover model calibration and validation using QField and the Digital Earth Pacific Jupyter Analytics Hub.


Introduction

Earth and Ocean observation technologies have advanced rapidly over the past decades, becoming not only more detailed in terms of spectral and temporal coverage but also increasingly accessible for a wide range of users.
Yet there have been ongoing barriers to the uptake and adoption of earth observation analytics needed to inform policy makers. Often these barriers have included complex and overly technocratic language and workflows whereby obstructing access and obscuring insights from satellite data. This has resulted in most analysis of satellite data being limited to academic and research oriented groups.
Other options have recently emerged with the potential to support a wider range of users to gain access to insights from earth observations. However, many of these work workflows remain obscured due to technical and emerging cost barriers. Others may rely on specific packages and libraries that sometimes become deprecated overtime, reducing the overall long-term replicability of these workflows for wider users.
Recent advancements in cloud computing infrastructure in the Pacific region have the potential to enable wider access for various users to access standardized and customised replicable workflows in the long term without cost.
This paper highlights the datasets, analytical tools, computational capacity and insights made possible through Digital Earth Pacific (DE Pacific). This is a public technology infrastructure which has learned from the models of Digital Earth Africa, and Digital Earth Australia.

Land cover monitoring:
Land Use Land Cover (LULC) models shed light on the proportions and distributions of different natural and man-made environments across landscapes at given points in time. When multiple land cover model maps are generated for different points in time, the results can be analysed to detect changes in different land use and land cover classes over time. This analysis is commonly applied to the monitoring and management of a wide range of sectors including, but not limited to, forestry, agriculture, urban planning, infrastructure, water management and mining (Topuz and Deniz, 2023). Yet, there have been persisting challenges for machine learning approaches to meet thresholds of accuracy while generating LULC classification models at scale. Some of these challenges have included:
Generating LULC models that provide an accurate prediction of land use and land cover distribution at the local scale (meeting accuracy assessment thresholds of X)
Scaling of LULC models across diverse ecosystems and geographies while continuing to still meet accuracy assessment thresholds.
Reconciling between local needs inputs including groundcover observation points and globally standardised LULC classes and models (for reporting purposes).
There is no one ideal classification of land use and land cover, and it is unlikely that one could ever be developed. There are different perspectives in the classification process, and the process itself tends to be subjective, even when an objective numerical approach is used (USGS, 1976).

Aims
The aim of this paper is to provide an overview of the Digital Earth Pacific and some of its recent scientific benchmarking.
This paper focuses on the used case of nationally driven land service for land cover machine learning classification models. This participatory workflow may be of interest to other stakeholders in the Pacific and more widely who are interested in replicating this for other use cases and sectors. In doing so, the paper may shed some light on best practices within the Pacific region for land use land cover model calibration and validation. The paper will also seek to highlight the long-term replicability of these open-source workflows that are not subject to pay walls or commercial platforms.
The paper is also intended to raise greater awareness of the current datasets, regional products as well as the methods and workflows used in Digital Earth Pacific.

  1. Materials and methods
    2.1. Study area(s)
    Study areas included PICTs that have participated in past DE Pacific Land Cover Assessment Skills Transfer (LCAST) workshops: Tonga, Fiji, The Republic of the Marshall Islands (RMI), Palau, Tuvalu and the Cook Islands.

2.2. Data and processing
2.2.1. Satellite imagery and the Digital Earth Pacific GeoMAD
Men of the analysis ready data products are made possible through the DE Pacific GeoMAD (Leith, forthcoming).

2.2.2. Participatory field data calibration and validation using QField
QField is an open access and open source mobile application that is connected to quantum geographic information systems (QGIS) software. This mobile app allows for the collection of Geotagged data points, transects, polygons, and other field data features. In this use case, these data sets are crucial for the calibration and validation of machine learning land cover classification models.
The participatory elements of these workflows are supported through open-source approaches. The country driven surveys involved use of QField to GPS points that provide a range of different values for the different land cover classes. There are six standardized land cover classes as defined by the IPCC in the Chapter 2 of Chapter 2: Generic Methodologies Applicable to Multiple Land-Use Categories.
Through country-driven workshops and surveys, local participants are able to contribute to a spectral database that allows for the training of machine learning land cover models. There are options to collect more detailed classes, including other land uses and land cover types outside of the standardized six classes. By including Traditional Ecological Knowledge (TEK) there is also greater room for localization and customization of capabilities with greater room for local inputs into capturing more complexity in terms of Land Use and Land Cover (LULC) changes . A longer list of land cover classes can also be aggregated into a simpler list, including for compatibility with the IPCC classes.

2.2.3. Participatory post-processing
This process involved collating, cleaning and validating all of the data sets collected through the field surveys. Participants were then able to ingest these datasets into Digital Earth Pacific through a Jupyter Notebooks environment to then run the random forest classification and other classifier models through SciKit-Learn libraries.

The results are shared in this paper including:

1) LULC model results maps and tables
2) Models trained at a national-scale
3) Skills transfer for ongoing replication

There are also areas of Intrinsic value and ongoing capacity building in the region. The paper also shows feedback on the results of pre-and post survey and workshop capacity building shared by the workshop and survey participants.

Nick is currently completing his PhD through a joint cotutelle program between the University of the South Pacific and the Australian National University. His research focuses on ridge-to-reef environmental monitoring as well as GIS environmental modelling and remote-sensing land-sea frameworks through riparian corridors. He completed his MSc in the water science specialisation through courses in both the Fenner School as well as the Research School of Earth Sciences at the ANU. Nick also completed his MSc thesis research on quantifying the impacts of in-river gravel extraction on sediment transport in Fiji.

Nick's research areas and skills include: GIS and remote sensing, hydrological and environmental modelling, python, FullCAM carbon accounting, field sampling and measurements of surfacewater and groundwater chemical, geophysical and hydrological parameters and some ecological fieldwork sampling experience forestry biomass carbon assessments as well as sampling of benthic invertebrates and ichthyofauna.

He has worked in a range of Government Departments including the Federal Departments of Agriculture, Water and Environment, the Climate Change Division of the Department of Environment and Energy and the Australian Trade Commission. During this time, Nick also worked in environmental monitoring of the impacts of the Ranger Uranium Mine on the Magela floodplains and creeks adjacent close to Jabiru and Kakadu in the Northern Territory. Nick led a team of volunteers to secure second place in the MAXAR Spatial Challenge regional category through a project that combined Digital Globe sub-metre high resolution imagery with FullCAM modelling to assess regeneration of biomass carbon in the context of the 2019-20 bushfire recovery through a case study in Cann River, Gippsland. Nick was also the team lead for the Yadrava na Vanua team that gained first place in the Space for Planet Earth Competition to use satellite data to estimate carbon sequestration. The team was led by students and staff from the University of the South Pacific, University of Fiji and Fiji National University.

This speaker also appears in: