Managing Heterogeneous Spatial Data with Analysis Ready Cloud Optimized formats
10-18, 11:35– (Pacific/Auckland), Te Iringa

Presenting a project using ARCO formats (Zarr and GeoParquet) to manage varied data from different organisations


This presentation describes the use of Analysis Ready Cloud Optimised (ARCO) formats for efficiently managing spatial data from different sources with varying formats, structures, and sizes. The techniques presented here were developed in the context of building a fit-for-purpose Data Management System for the Great Barrier Reef as part of the Australian Government’s Reef 2050 Integrated Monitoring and Reporting Program (RIMReP). For this project, there was the need to assimilate several datasets managed independently by different organisations, and to make them available with a common structure while trying to optimise for efficiency in the most common patterns of accessing the data.

The presentation will focus on our design of two main general structures for raster and tabular data, based on Zarr and GeoParquet respectively, and discuss how these structures were designed to optimise access to the data while maintaining a consistent framework. We will provide case-by-case examples of how these structures were tweaked according to the needs of each particular dataset, as well as our plans for the future and our experience of what did and didn't work, as a way of sharing knowledge about the use of ARCO formats for efficiently managing spatial data.

Computer scientist, did a PhD in machine learning, I've now put that aside for a while to dabble in the field of geospatial data. New to most concepts here but enthusiastic to learn and to contribute!