Moving from Science to Product: Making Open GIS Software Work for Us
11-19, 14:00–14:25 (Pacific/Auckland), WG404

How CTrees wrote open source code to complete zonal stats in order to move from science results (rasters) to interactive mapping apps.


At CTrees, we create machine learning models that integrate multiple data sources to produce high-resolution, time-series datasets on forest carbon and activity. Our outputs— estimates of carbon stocks, emissions, removals, and forest change—support projects ranging from jurisdictional analysis to deforestation monitoring. However, scaling from research-driven models to a production-ready platform required solving two major challenges: (1) data management and (2) enabling fast, on-demand analysis that serves both internal researchers and external users.
On the data side, we faced the all-too-familiar chaos of manual and ad-hoc dataset versioning (v2, v2_final, v2_final_final), inconsistent folder structures where important data dates could be recorded in the filename or prefix, and directories packed with thousands of tiny GeoTIFFs meant to be mosaicked together. Instead of chasing S3 paths and manually stitching ad-hoc datacube files, we adopted Icechunk (open source) by Earthmover, which leverages Zarr and Icechunk to enable structured, versioned cloud-native datacube access. As a bonus, Arraylake (built on icechunk) makes it easy to view the data via a WMS service, streamlining the visualization process for both internal users and external stakeholders.
The second challenge was transitioning from scientific scripts—often written in R or relying on heavy GDAL operations—to a streamlined, scalable approach. To enable on-demand analysis, we refactored these scripts into ctreeskit (https://github.com/ctrees-products/ctreeskit) , our open-source Python package that consolidates spatial processing and zonal statistics using Xarray. Currently in its pre-release (beta) version, ctreeskit is designed for flexibility, it supports researchers across CTrees in papers, reports, and internal workflows. More broadly, ctreeskit is available to anyone performing similar calculations. By standardizing these processes and ensuring transparency, we enhance reproducibility, efficiency, and accessibility, allowing both internal teams and external users to understand and verify our methodologies.
With ctreeskit and Arraylake, we now have three key things: (1) a reliable way to save and access data, (2) a simple way to slice and dice high-dimensional data in the time and spatial domains, and (3) lightweight tools for running reproducible calculations. Moving from raw GeoTIFFs to an array-based Zarr model gives us fast, structured access to data without the complexity of STAC catalogs. By shifting from scattered S3 paths and heavyweight processing to a structured, cloud-native approach, we’ve made geospatial data easier to access, faster to analyze, and more transparent—helping both internal teams and external users work more efficiently.

Naomi is the Head of Engineering at CTrees, where she leads a growing team of engineers dedicated to building scalable technology for global carbon and forest monitoring. Since joining CTrees in 2023, she has worked across teams to shape product direction and deliver impactful, data-driven tools, playing a pivotal role in launching major technical products such as JMRV and REDD+AI. With over seven years of experience in GIS and software engineering, Naomi previously served as a founding engineer at a startup where she helped bring a social app from concept to launch, and earlier worked on large-scale geospatial data systems with the Bing Maps team at Microsoft. She holds a bachelor’s degree in geography with a focus on GIS Systems and a minor in mathematics from the University of Washington. Having grown up across Thailand and Ecuador, Naomi developed a lifelong appreciation for forests and biodiversity, which continues to inspire her work. Outside of work, she can often be found backpacking or planning her next big hiking adventure, and she enjoys experimenting with new data tools and geospatial technologies at the intersection of nature, data, and software.