FOSS4G NA 2024

Embed all the things: the promise of geospatial vector embeddings
09-10, 15:00–15:30 (America/Chicago), Grand G

Geospatial foundation models have arrived and are getting better. We examine how the vector embeddings they provide can be used for clustering, change detection, and natural language-search, and discuss the challenge of scaling these solutions across time and space.


In this talk, we explore the immense potential of vector embeddings from open-source geospatial foundation models to revolutionize Earth observation. In particular, we show how geospatial vector embeddings enable (i) clustering and similarity search, (ii) a more robust and holistic method of change detection, and (iii) searching huge catalogs of imagery with natural language. We emphasize how these capabilities can be built entirely with open-source solutions; these include (among others): open-source geospatial foundation models such as Clay and SkyCLIP, open-source geospatial machine learning libraries such as Raster Vision, and open-source geospatial databases such as PostGIS. Additionally, we discuss some interesting open problems that need to be solved if we want to scale up such approaches to a global level and opportunities for the open-source community to contribute.

Vector embeddings have emerged as one of the most important tools from the deep learning revolution. The remarkable ability of deep neural networks to turn complex data such as images and text into semantically meaningful vectors in a kind of abstract concept-space has unlocked all manner of interesting applications. Simultaneously, the proliferation of pre-trained open-source models (“foundation models”), especially those pre-trained on geospatial data, has made it trivial to compress Earth imagery to vector embeddings. The question now is: what can we do with them? In this talk, we present three potential use cases.

For our first use case, we show how, at a basic level, vector embeddings can be used to cluster images into semantically meaningful clusters as well as to find images similar to a given image.

For our second use case, we demonstrate a more advanced analysis. We show how instead of detecting change for a location by comparing images from two discrete timestamps, we can instead model its entire history using vector embeddings and detect change and anomalies by measuring how much they deviate from the model. This approach has the additional benefit of being robust to seasonal variations throughout the year.

For our third use case, we demonstrate the use of a slightly different kind of geospatial foundation model: a vision-language model. We show how vector embeddings from such a model can be used to search images based on their semantic content. This enables searching using natural language queries such as “houses with swimming pools” over large geographical areas such as entire cities or states.

Finally, we discuss some engineering challenges involved in deploying such solutions at scale. These include: choosing the chipping grid, efficiently storing and searching tens of millions to billions of vectors, versioning the embeddings, and more.