Text2Transit: From SNS posts to Timetable Aware Multi-stop Train Itineraries with Open Geospatial Tools
2026-09-02 , Himawari

We present an open geospatial workflow that adapts LLMs for structured geospatial extraction from SNS posts and uses optimization to generate timetable aware multi stop rail itineraries with guaranteed route optimality.


Social networking service (SNS) posts increasingly influence urban travel choices, but they are challenging to use directly for multi stop itinerary planning because the information is informal, unstructured, and often distributed across mixed language captions. In train-oriented cities, users may find several places in a single post yet still face the burden of identifying locations, structuring them as geospatial data, and deciding an efficient visiting order.

This talk presents an open-general workflow that addresses both problems. First, we use and adapt LLM based extraction to convert unstructured SNS posts (in this talk we focus on extraction of café spots) into structured geospatial point of interest data. The workflow is designed for Japanese and English Instagram style captions. We introduce Segmentation Aware Geospatial Extraction (SAGE), a segmentation based extraction prompting technique, to identify multiple cafe locations from noisy SNS text. This method addresses structural failure modes that prompting alone cannot resolve, especially the segmentation of long, mixed language captions containing multiple points of interest. Second, we use dynamic programming to compute the best visiting order over the extracted locations in a timetable aware rail network. The routing stage is formulated as a traveling salesperson type problem, which allows us to guarantee the optimal route for the extracted set under exact solving conditions. In particular, the routing stage maps extracted points of interest to nearby train stations and builds a generalized travel time matrix that combines walking access with timetable-based train travel. Because train services follow scheduled departures and arrivals, the underlying network is treated as a time dependent graph in which edge availability and travel cost vary with time. The itinerary is then solved as an asymmetric traveling salesperson type problem, where non symmetric costs arise from transfer penalties, service frequencies, and direction dependent travel times. Using an exact optimization method such as dynamic programming or mixed integer programming, the framework guarantees the optimal route for the extracted set under the defined travel cost assumptions.

The talk emphasizes how open source LLMs can be used effectively not only for text understanding but also for practical geospatial data structuring, and how optimization provides a principled foundation for reliable itinerary generation. The overall contribution is a reproducible open geospatial workflow connecting SNS text, structured point of interest extraction, station mapping, and optimal rail itinerary planning.


Level of technical complexity: 2 - intermediate Give indication of resources (video, web pages, papers, etc.) to read in advance, that will help get up to speed on advanced topics.:

No advance reading is necessary for following the main talk. For attendees interested in the more advanced technical aspects, useful background includes OpenStreetMap based geospatial workflows, open source LLMs for structured information extraction, time dependent transportation networks, and exact optimization methods for traveling salesman problems, especially dynamic programming and mixed integer programming.

Indicate what is (are) the open source project(s) essential in your talk:

Essential open-source components in this talk include open source LLMs for geospatial extraction from SNS posts, especially Gemma, Llama, and gpt-oss among others; OR-Tools, a Python based optimization library for exact timetable aware rail routing; and open-source routing data from Public Transportation Open Data Center (https://www.odpt.org/en/). Map based outputs can also be presented using open tools such as QGIS or Leaflet.

I make my conference contribution available under the CC BY 4.0 license. The conference contribution comprises the abstract, the text contribution for the conference proceedings, the presentation materials as well as the video recording and live transmission of the presentation:

Takuya is a Senior Researcher at Fujitsu Research of America. He completed PhD in Computer Science at the University of Chicago. His interests focus on the interdisciplinary scientific field of machine learning/deep learning, satellite imagery, physical sciences, and HPC.