09-11, 16:00–16:30 (America/Chicago), Grand C
DLG has developed the first iteration of a geospatial OSINT visualization and analytics framework for situational awareness of internet activity globally. This presentation will investigate open-source technologies and data utilized to perform workflow orchestration, data management, analytics, and visualization.
This presentation will focus on the development of an OSINT visualization and analytics framework, including an overview of the tech stack and data sources as well as a deep dive into some of the technologies used. It will start with an overview of the issues working with OSINT data and how this framework intends to solve those problems. While all aspects of the framework will be covered, there will be a more comprehensive look at the data management and backend pieces of the framework.
Integrating OSINT data sources can be challenging due to their level interpretability and completeness and we will focus on how we overcame some of these challenges for the Open Observatory of Network Interference (OONI), Open Cell ID, and World Bank Open Data datasets. Extract, Transform, and Load (ETL) of these data sources is accomplished using an orchestration tool called Metaflow and we will briefly cover how an internal trade study led the adoption of this technology.
For a deeper dive into how open-source software was used in development of the framework, we will turn to the data management and backend components of the application. Data management falls into two categories: a Postgresql+Postgis database and AWS S3 object storage. The database is used to store data from the ETL process but also to further transform that data into mappable insights derived. Materialized views and pgcron are used to autogenerate these data aggregations and alembic is used for database version control. SQLAlchemy Object-Relational Mappers (ORMs) are used to ensure data validity and maintain reproducibility. S3 is used to store intermediary ETL files and analytics derived from the data stored in the database. We use DuckDB and Hive partitioning to save analytics into parquet files partitioned by date and country.
The backend/API was built using the python web framework FastAPI which has some developer friendly features including the automatic documentation feature. Pydantic is leveraged to ensure data validation in the API. The SQLAlchemy ORMs created for the Alembic migrations referenced above are also used by FastAPI to define data models and streamline the development of CRUD operations and functions. FastAPI also provides a web interface to quickly test endpoints to make sure they are functioning as intended. Additionally, we will discuss the use of PG_Tileserv and PG_Featureserv to generate web map tile services (WMTS) for use in the front-end web mapping application including how database roles control the generation of tiles.
We will briefly touch on the technologies used for front end web interface, including React and Mapbox, before jumping into a quick demo of the application.
Lastly, there will be a discussion on future work that is planned for future iterations of the OSINT visualization and analytics framework. This will include the subject of multitenancy and protecting user uploaded data, AI powered chat bot search functionality, additional data sources and natural language processing, and more analytic capabilities including spatial modeling, dasymetric mapping, network analysis, predictive analysis and more.