AI Wrangling in the Early 21st Century FOSS4G NA 2024

AI Wrangling in the Early 21st Century
.ical

09-10, 11:00–11:30 (America/Chicago), Grand G

At Impact Observatory we use convolutional neural networks (CNNs) to produce global 10m Land Use/Land Cover (LULC) maps, updated annually. We detail using large language models (LLMs) for geospatial applications, emphasizing open standards: STAC, WPS, and OGC API - Processes.

Impact Observatory was founded 4 years ago to focus on the application of convolutional neural networks (CNNs) to produce Land Use/Land Cover (LULC) maps at unprecedented speed and scale. Using 1 billion hand labeled pixels we trained a computer vision model, which we then used in combination with some old fashioned remote sensing to produce the world’s first global 10m LULC map; since updated annually to produce a time series for the last 7 years (available as open data). Building on this foundation, we can now produce LULC maps (and some related derived data products) on demand for any location and time.

Large language models (LLMs), or so-called Generative AI, have captured our collective imagination. Whereas CNNs excel at capturing spatial patterns and hierarchical features (which is ideal for image recognition tasks) LLMs are neural networks good at getting context from text. Further, contrary to CNNs, where we spend the time and effort to actually train our own model, with LLMs we don’t bother with training (leaving this to the big tech companies) and instead attempt to adapt pre-trained models to our domain.

In order to do so effectively, it is necessary to tailor one’s speech to elicit desired behavior from the LLM (i.e., prompt engineering). But more than that, LLMs excel at writing code (generically) and producing valid inputs for functions (specifically). The latter case is particularly useful for geospatial applications, where we can create context specific tools that the LLM can then interact with on behalf of the user. This is especially important to prevent the LLM from simply making up answers (which it is all too happy to do).

Open standards are particularly important here as these give the LLM rich and stable interfaces with which to interact. STAC, of course, which we have discussed in great detail within the community. Less so, WPS and OGC API - Processes, which we focus on here. In particular, we discuss a LangChain toolkit for interacting with Planetary Computer (and by extension any other STAC catalog) and
detail the development of a tool for accessing WPS processes (using GeoServer). Finally we focus on a path forward for general geospatial enablement of LLM models through OGC API - Processes interactions (using pygeoapi as the reference implementation).

Mark Mathis

AI Wrangling in the Early 21st Century .ical 09-10, 11:00–11:30 (America/Chicago), Grand G

AI Wrangling in the Early 21st Century
.ical

09-10, 11:00–11:30 (America/Chicago), Grand G