FOSS4G 2022 general tracks

Geostack: a high performance geospatial processing, modelling and analysis framework
2022-08-24, 11:30–12:00 (Europe/Rome), General online

Large geospatial data sets generated by modern remote sensing and environmental modelling provide new opportunities for analysts, scientists, and researchers. However, the size of these data sets can present challenges due to the computation and resource management required for analytics and processing. Current solutions for processing such data sets largely focus on horizontal scaling approaches on, for example, distributed systems such as the Cloud, without fully exploiting the opportunities offered by modern computing architecture. Furthermore, the variety of formats and types of geospatial data often result in complex processing workflows composed of multiple tools for reading and writing, transformation, processing, and resource management. We present an introduction, overview, and demonstration of the open-source Geostack framework (gitlab.com/geostack/library). This has been developed to help simplify many common operations, provide economy of code and to transparently take advantage of modern CPU/GPU hardware. We have aimed to provide three main routes to simplify and accelerate geospatial processing. These are: 1) a unified interface to read vector and raster data and interoperate between them, with no software dependencies for common geospatial data formats, 2) treatment of all data as objects independent of geospatial transforms, with transparent resource management through an underlying tile-based caching system, reprojection and interpolation carried out where needed, 3) extensive use of OpenCL to provide computational acceleration and automatic processing vectorisation on GPUs and multi-core CPUs as well as user-defined scripts to be executed over these objects. The framework also includes many common geospatial operations as well as several base geospatial solvers (including moving fronts, flow networks, particle modelling) accelerated using OpenCL. Geostack is a C++ API with Python bindings, the code examples and demonstrations are presented in Python. The Python bindings are available through conda and fully interoperable with common Python libraries including numpy, gdal, xarray, netcdf, geopandas and sqlite, allowing users to use as much or as little of the Geostack functionality as required. We present demonstrations of several common geospatial tasks with benchmark comparisons to alternate workflows.

James Hilton is a principal research scientist in Data61, CSIRO. His current role involves the development of methods and analytical frameworks for geospatial analysis as well as modelling of natural hazards such as wildfires and floods.

Nikhil is a research scientist within the Natural Hazard and Infrastructure team in Data61 CSIRO, which he joined 2017 as a postdoctoral before. Prior to joining Data61, he was a PhD student at School of Mechanical and Aerospace Engineering, Nanyang Technological University where he studied the effects of air-sea interaction on hurricane using a coupled atmosphere-ocean-wave model. His current interests include application of computational methods for modelling natural hazards such as flooding and bushfires and coupling of Conformal Cubic Atmosphere model with the Spark and Swift framework.