Awakening Dormant Geospatial Data: Structuring Large-Scale Government Documents with LLM
2026-09-02 , Ran1

We use LLM to extract and structure geospatial data buried in 100K–1M+ PDF and Office files held by Japan's MLIT, enabling visualization, spatial analysis, and evidence-based policymaking — demonstrated through real-world use cases, no coding required.


By combining LLM-based structuring with spatial joins, we achieve robust data integration that goes beyond simple text matching.

Key features:

  • Batch structuring and high-speed parallel processing of PDF, Excel, Word, and PowerPoint files using LLM
  • Data cleansing, geocoding, and spatial joins to reconstruct documents as geospatial data
  • Spatial analysis and visualization leveraging the rich geographic density unique to MLIT datasets
  • End-to-end pipeline from data extraction through anonymization to open data publication

Who Should Attend:

  • Government and municipal officials interested in digital transformation and data infrastructure
  • Researchers and think tank professionals involved in EBPM
  • Data engineers, GIS developers, and no-code/low-code developers
  • Startups and corporate representative working on projects that utilize open or public data

Level of technical complexity: 1 - beginner I make my conference contribution available under the CC BY 4.0 license. The conference contribution comprises the abstract, the text contribution for the conference proceedings, the presentation materials as well as the video recording and live transmission of the presentation:

I am leading the development of an application to be introduced in the talk.