Kadri Arrak

Kadri Arrak has long-term experience in communication, marketing and sales. At Positium, her role is to help countries use mobile positioning data to make smart, data-driven decisions. As Head of Operations, she oversees the operational activities of the business, and she is responsible for communication between Positium's project teams and clients and ensuring that work on any mobile data or GIS-based project is on schedule and budget. Kadri has managed the expansion of the public transport network based on mobile data, the mobility frequency analysis of road sections, the pilot project and two continuation projects of Lithuanian foreign tourism statistics, and also leads Positium's longest-running project, the Estonian foreign tourism statistics project.


Sessions

07-04
16:30
30min
A Processing Pipeline For European Official Statistics: Towards Standardisation Of Mobile Network Operator Data Processing
Kadri Arrak, Marko Peterson

Disclaimer: The views in this abstract are those of the authors and do not necessarily reflect the position of the European Commission (EC) or national statistical institutes

Abstract:
The European Statistical System (ESS) – the partnership between the EU statistical authority (Eurostat) and national statistical institutes (NSI), and other statistical authorities in the European member states – considers Mobile Network Operator (MNO) data as one of the most promising new data sources for future statistical production. The production of official statistics based on MNO data has the potential to provide considerable societal value. In this context, the ESS emphasises the need for standardised reference methods adhering to the principles of statistical production, such as quality, privacy protection, and transparency.

In line with the ESS Innovation Agenda, following an open call for tenders, in December 2022, Eurostat awarded the service contract “Development, implementation and demonstration of a reference processing pipeline for the future production of official statistics based on Multiple Mobile Network Operator data (TSS multi MNO)”*. The project is a significant milestone towards the future reuse of MNO data for the production of official statistics at EU level. The goal of the project is to develop a complete, open end-to-end processing pipeline that should serve as a starting point towards the regular production of future official statistics based on MNO data Europe-wide. This “processing pipeline” encompasses a combination of a fully documented open methodological and quality framework, plus the implementation of a reference open-source software pipeline compliant with the said framework. The processing pipeline will be demonstrated across data from multiple MNOs. If successful, the reference pipeline developed by the project will be proposed for adoption by the ESS as a methodological standard.
The project is being implemented by a consortium providing extensive experience from both the business and the official statistics domains. The consortium is composed of GOPA Worldwide Consultants GmbH (DE) - lead, Nommon Solutions and Technologies SL (ES), Positium OÜ (EE), Statistics Netherlands (NL) and the Italian Statistical Institute (IT). Additionally, five European MNOs from four distinct countries will be involved in the pipeline testing.
This collaborative endeavour aligns with the European Data Strategy's goal of providing comparable and reliable statistics across European countries. The project addresses the challenge of providing open and standardised methodologies for official statistics without hampering the development of future private initiatives nor the continuation of the range of analytic products based on MNO data that have been developed and commercialised by mobile operators or other third-party entities for purposes other than European official statistics.
While the project is financed by Eurostat (the EU statistical office), its ultimate success will depend on the potential endorsement of the project result by the larger ESS community (integrating all EU statistical offices and other national authorities). It is expected that this will have positive implications for future activities and may serve as a model that can be replicated in other domains, along with seeking closer collaboration with industry or business partners, more in general, in the context of initiating or strengthening co-development undertakings for the production of official statistics.
This contribution will focus on the presentation of the overall pipeline architecture and the description of an initial version of the processing pipeline. The architecture design will adhere to the highest technical requirements and methodological soundness. The proposed pipeline considers the division between data processing at the MNO environments and additional processing steps at the NSI or other parties.
The software will be divided into modules for (1) the processing of disaggregated data exclusively at each MNO’s secured environment, and (2) the post-processing of aggregated and anonymous data at national statistical offices. The latter is particularly relevant since the post-processing will be performed on aggregated data after the application of statistical procedures, such as Statistical Disclosure Control (SDC), that ensure that individual data cannot be referenced back. Comprehensive documentation, including functionality, implementation details, and usage instructions, will accompany the software. Reference test data, consisting of synthetic or semi-synthetic samples, will be created for each software module to ensure reproducibility and ease the development of alternative but fully compliant software implementations by independent entities.
The entire open-source pipeline, including the codes and related documentation, as well as the methodological framework, will be openly published. The software codes will be published under an EUPL license, promoting transparency and accessibility, facilitating the replication and adoption of the developed software solutions, and encouraging collaboration and further advancements in the field of statistical production. The reference implementation of the pipeline will be public, and results will be communicated to interested audiences through public official channels.

FOSS4G ‘Made in Europe’
Destination Earth (Van46 ring)