FOSS4G 2022 general tracks

Spatio-temporal Database - Creating a high availability easily scalable Spatio-temporal database cluster with Postgres, PostGIS, and timescaleDB!
2022-08-25, 12:30–13:00 (Europe/Rome), Room Limonaia

I am working as the Technical Lead at Blue Sky Analytics, a climate-tech startup empowering the world’s decision-makers with accurate, real-time, and standardized climate data.

All datasets that we are building here at Blue Sky Analytics, technically have one similarity - they all have a space and time component. We tried to build solutions like filling empty values in inconsistent temporal data, and dividing the data in specified time period chunks for faster queries, while these worked as POC, they were not easy to scale up. Working with structured data was much easier to understand, working on postgres with the addition of timescale and PostGIS gave us exactly what was needed. Building the solution at the database level with the existing open-source technologies has been an exhilarating experience.

Imagine a dataset with hourly frequency going back years on a global level, with frequent inconsistencies, that not only you have to efficiently store but that should also be highly accessible in combination with other such datasets. If not for the open-source, we would not have been able to answer questions like:
- How much have the lakes shrunk between the years 2010-2020 on a yearly basis?
- Finding GHG emissions from biomass burning of "all US states, for the last 10 years on a monthly, weekly, daily basis".
Leveraging other open source solutions like h3-pg indexing also helped us to reduce the query time by an exponential factor for global level queries!

While the database sounds pretty amazing, another challenge was putting it all together and deploying it on the cloud, which was a whole another challenge. The most intuitive solution was to deploy a bunch of Postgres instances. While it was not so hard to implement the basics, it became almost impossible to scale up or down, install rolling updates, account for failures.

Keeping up with the tech, Kubernetes seemed like a great solution for building a high availability cluster service, and finding the postgres-operator (PGO) by crunchydata was exactly what we needed. It combined all the right tools like pgBouncer, pgBackRest, and monitoring solution using grafana and Prometheus all in one packaged easily to deploy service. While the learning curve with Kubernetes was a little steep, it lead to building a highly scalable and resilient database cluster.

The PostgreSQL + PostGIS + Timescale + H3 stack helped us simulate the temporal and spatial nature of the world at the database level and gave a universal approach to store and query all our datasets. It can handle textual data like fires with time (recorded time) and spatial information (lat -long) or shapes of counties, water bodies, etc., and combine them with each other using few joins giving us a very powerful geospatial-temporal query engine.

Without FOSS it would have been impossible to even imagine any of this but as of now, we are quantifying climate change!

Technical lead at BlueSky Analytics (building a catalogue of environmental datasets) I work with the entire stack from devops, APIs and frontend all in day's work!