Jelena Panagiotakou


Session

06-30
12:35
5min
Advancing Open Geospatial Data: Multi-Source Maritime Monitoring and Semantically Enriched Urban Mobility Datasets
Ioannis Kontopoulos, Jelena Panagiotakou

The increasing availability of geospatial data and the growing maturity of open-source technologies have created new opportunities for addressing complex challenges across domains such as maritime surveillance and urban mobility. However, despite significant progress, the geospatial community continues to face limitations in accessing high-quality, interoperable, multimodal, and semantically enriched open datasets. This work addresses this gap by presenting four open-access geospatial datasets developed within a unified vision of openness, interoperability, and reproducibility: two datasets targeting vessel monitoring and two focusing on urban mobility. These datasets are part of the MUltiSensor Inferred Trajectories (MUSIT) project, an international, interdisciplinary initiative funded by the European Union's Horizon Europe program. MUSIT aims to transform heterogeneous tracking sensor data into complete, semantically enriched trajectories, opening new perspectives in mobility monitoring and fostering collaboration among academia, industry, and innovators.
The first dataset, namely Multimodal Maritime Dataset on the English Channel [1] (MMDEC), provides a comprehensive multi-source view of maritime activity within a defined Area of Interest covering the western Celtic Sea, the English Channel, and part of the North Sea. Spanning a three-month period from July to October 2023, MMDEC integrates heterogeneous data streams including Automatic Identification System (AIS) signals, satellite imagery, meteorological and oceanographic data, port locations, and marine protected areas. By combining these diverse sources into a single, harmonized dataset, MMDEC enables advanced analysis of maritime behavior, anomaly detection, and environmental monitoring. Its multi-layered structure reflects real-world operational complexity and supports a wide range of use cases, from maritime safety to ecological impact assessment. Within the MUSIT framework, MMDEC represents a concrete realization of the project's data collection and integration pillar, contributing a rich, multi-sensor foundation for subsequent trajectory reconstruction and analysis.
Complementing this dataset, AegeaNET [2] introduces a real-time dimension to maritime monitoring through an open sensor network deployed across the Aegean Sea. AegeaNET comprises strategically positioned AIS and ADS-B receivers that capture maritime traffic, providing continuous streams of positioning data to facilitate real-time tracking and situational awareness. As an academic and open initiative, AegeaNET exemplifies how distributed, community-driven sensor networks can enhance transparency and data availability in critical domains such as navigation safety and border monitoring. In alignment with MUSIT's core vision, AegeaNET directly addresses the challenge of incomplete or fragmented tracking data by offering persistent, sensor-based observations that feed trajectory inference and fusion pipelines. Together, MMDEC and AegeaNET demonstrate complementary approaches to maritime data collection: one focused on multi-source historical integration, and the other on real-time, sensor-based observation.
In the domain of urban mobility, we present two semantically enriched trajectory datasets generated for the metropolitan areas of Paris and New York City [3]. The raw trajectory data underpinning both datasets consists of publicly available GPS traces voluntarily shared by users through OpenStreetMap, retrieved via the OSM API over geographic bounding boxes covering each city. This choice of source ensures full openness and compliance with the Open Database License, while avoiding the privacy issues that typically hinder the release of mobility data. These trajectories are then semantically enriched with multiple contextual layers drawn from heterogeneous open sources. Spatial context is provided through Points of Interest, also extracted from OSM, while weather conditions are integrated from meteorological data services. Additional inferred attributes - including detected stops, movement segments, and transportation modes - are derived through spatio-temporal analysis of the raw GPS signal. A particularly novel contribution is the inclusion of synthetic yet realistic social media posts, generated by a Large Language Model carefully instructed to simulate user-generated content associated with observed movements. This multimodal enrichment opens new possibilities for research at the intersection of mobility analysis and natural language processing. Consistent with MUSIT's emphasis on cross-domain representation and information fusion, the datasets are released in both tabular and Resource Description Framework formats, supporting semantic reasoning, knowledge graph construction, and compliance with the FAIR (Findable, Accessible, Interoperable, Reusable) data principles. Together, these design choices make the datasets valuable resources for a wide range of tasks, including behavior modeling, mobility prediction, and LLM-based applications.
A key contribution of this work lies not only in the datasets themselves but also in the reproducible and extensible processes used to generate them. By openly sharing both the data and the underlying pipelines, we aim to empower the community to replicate, adapt, and extend our approach to other geographic regions and application domains. This is particularly important in the context of semantically enriched mobility data, where the combination of heterogeneous contextual information remains a significant barrier to entry for many researchers and practitioners. The MUSIT project, through its training and mobility programs and its commitment to open knowledge exchange, actively encourages reproducibility and community-driven engagement.
From a broader perspective, these four datasets illustrate the potential of open geospatial data to bridge domain gaps and foster cross-disciplinary innovation. The maritime datasets highlight the importance of integrating heterogeneous environmental and operational data sources, while the urban mobility datasets demonstrate how semantic enrichment can unlock insights into human movement patterns. Both cases emphasize the role of open standards, open-source tools, and collaborative infrastructures in advancing the state of the art - values that are central to MUSIT's mission of building a dynamic community capable of turning research into tangible societal value.
Finally, this work aligns closely with the principles of the open geospatial ecosystem by promoting transparency, accessibility, and reuse. By contributing these datasets to the community under the MUSIT project, we seek to support ongoing research, policy-making, and industry applications, while also encouraging further contributions and collaborations within and beyond the consortium.

Academic track
A01