11-04, 11:00–11:30 (America/New_York), Regency Ballroom B
While DuckDB has no direct support for raster data, we have had success using H3 to query and aggregate raster data. Learn about our experiences conducting fast raster analytics using DuckDB and H3.
DuckDB is rapidly becoming the FOSS tool of choice for fast, local analytics. While its extension ecosystem has added geospatial support, there isn't a clear way to perform raster analytics inside DuckDB. In this talk, we will discuss our experience at Fused using DuckDB and open source extensions like h3-duckdb to perform fast raster analytics.
Raster datasets present unique challenges for analytics platforms. While vector-friendly databases can easily represent points, lines, and polygons, the gridded nature of rasters doesn't translate naturally to traditional database structures. While some have tried converting rasters to vector geometries, these approaches lack the performance that makes DuckDB attractive.
Our approach utilizes a different option, specifically employing the H3 hexagonal hierarchical spatial index to reaggregate raster space into manageable analytical units.
Hexagonal grids offer advantages over traditional square pixels for analytical purposes, including more uniform adjacency relationships (all neighbors are equidistant), and better approximation of circles, minimizing sampling bias.
The H3 library, originally developed by Uber and now an open-source project, provides an ideal framework for this approach with its global hierarchical hexagonal grid system.
In this talk, we describe the preprocessing of raster data, integration of analytic tooling with DuckDB, the SQL analytics using DuckDB, and how we returned and visualized results.
We'll demonstrate case studies where this approach has proven effective, including:
- USDA Cropland Data Layer (CDL)
- Weather data (ERA5)
- Digital elevation models (DEM)
While powerful, our approach does have limitations. Operations requiring pixel-perfect precision or very high-resolution outputs may still require traditional raster tools. Additionally, initial conversion of large raster datasets to the H3 format introduces overhead.
The combination of DuckDB's analytical power with H3's spatial indexing provides a remarkably effective approach to raster analytics. This talk will provide attendees with practical knowledge about implementing similar systems, code examples for common analytical tasks, and insights into performance optimization.
Isaac Brodsky is co-founder and CTO of Fused, building serverless compute for Python. He was previously co-founder of Unfolded, and previously software engineer at Uber on Marketplace Data. He maintains the H3 open source library and the DuckDB extensions h3-duckdb and duckdb-zipfs.