11-04, 11:00–11:30 (America/New_York), Lake Anne
National spatial data infrastructures (NSDIs) contain a wealth of information that AI-based solutions can leverage to improve operational efficiency. We propose a collaborative strategy to bring geospatial awareness into LLMs at scale by constructing GraphRAGs for LLM Agents.
AI holds the promise of enabling government agencies to overcome the difficult challenge of improving efficiency despite deep budget cuts. Although LLMs excel at natural language processing and general reasoning, they lack the cross-sectoral, domain-specific, and up-to-date geospatial knowledge needed for decision support in improving disaster and crisis management localized responses, public health, business and economic insights, and security and defense applications. LLMs can discover networks of interrelated features for these purposes by leveraging the wealth of information provided by spatial data infrastructures. However, due to the incompatible representations of the same locations across systems, integrating by common geographies is prohibitively labor-intensive, especially when faced with staff reductions.
For many years, we have treated data quality as an analytics problem, delegating dirty data to the data team for cleanup in the data warehouse or lake. This approach is not suitable for AI applications. GenAI applications operate in real time, making decisions on the fly. If the data is incorrect, incomplete, or poorly structured, AI will not rectify it. Instead, it will make erroneous decisions more rapidly. You can’t wait until the analytics layer to ensure data quality when AI agents need to reason, plan, and act in real-time.
At FOSS4G NA 2024, we presented on the potential of spatial knowledge graphs (SKGs) to address limitations of LLMs with evolving domain-specific and spatial awareness. These SKGs, which are interoperable by location using machine-to-machine readable interfaces, can effectively manage changes over time. We initially developed an approach for managing dependencies and propagating change between interlinked spatial knowledge graphs within the health sector. However, this approach has since been adapted to serve NSDIs for two governments and disaster resiliency efforts.
This approach has now been implemented and deployed to production for the U.S. Army Corps of Engineers. This was achieved through the implementation of a Geospatial Knowledge Infrastructure (GKI), which enables the on-demand integration of SKGs, allowing LLMs to respond to ad-hoc queries. We will demonstrate how to implement an SKG as a graph-based retrieval-augmented generation (GraphRAG) that sustainably captures the semantics of geoinformatics as text that an LLM Agent can leverage. Instead of requiring the costly process of model training, the SKG as a RAG can represent semantic geospatial relationships across entire networks of features in multiple domains and in real time, if necessary, to understand the downstream impact or cumulative effect of events of interest.
We will also demonstrate how to write prompts for an LLM Agent to translate natural language questions into GeoSPARQL graph queries, which minimizes hallucinations. Accessing these generated queries provides traceability and can also be used to display answers on a map that correspond to the text response from the LLM. We will show how to display the individual features that comprise an aggregate question on a web map. For example, asking how many people would be affected by a flooding event can reveal the individual administrative boundaries that form the aggregate answer. The approach utilizes Apache Jena to implement the GraphRAG. The web map interface was developed entirely using open-source software components.
Finally, we will cover some of the standards being discussed in the Open Geospatial Consortium Geospatial Semantics Domain Working Group to address current standards gaps.