2026-09-01 –, Conference Management Room3
Apache Airflow manages tasks in a data pipeline from data ingestion and preprocessing to storage in ZARR format for multidimensional satellite imagery. ZARR supports efficient management of large-scale datasets and parallel processing, while Airflow automates and monitors workflow tasks.
At present, satellite imagery data is widely used to monitor and analyze spatial changes in various domains, such as land use change detection, natural resource monitoring, and environmental surveillance. However, satellite data typically exists as time-series data with large volumes and multidimensional structures. As a result, the storage, processing, and management of such data remain significant challenges in terms of data infrastructure and geospatial data processing.
Apache Airflow is used as a tool to control and manage the sequence of tasks in the data processing workflow (workflow orchestration), covering stages from data ingestion, data preprocessing, to data storage in Zarr (ZARR) format, which is a storage format suitable for satellite imagery and multidimensional geospatial data. The use of the Zarr storage format improves data accessibility, facilitates the management of large-scale datasets, and supports parallel data processing efficiently. In addition, Airflow enables automated workflow management, allowing monitoring of processing status and systematic control of task dependencies within the data pipeline. This makes it well suited for managing large-scale satellite datasets that require multi-stage processing.
Satellite imagery data stored in Zarr format can further be used as input for analyzing forest change using the LandTrendr method. LandTrendr is a time-series analysis technique designed to detect trends and long-term changes in land cover using satellite imagery data. Since satellite datasets are typically large and multidimensional, the use of Airflow for managing the data pipeline ensures that data preparation and storage processes are well organized. This enables the data to be efficiently utilized for LandTrendr analysis.
Overall, the implementation of a data pipeline using Apache Airflow for satellite data management—from data ingestion and processing to storage in Zarr format—enhances the efficiency of handling large-scale satellite datasets. It also enables automated data processing workflows with systematic monitoring capabilities, while supporting the application of the data for forest change analysis using the LandTrendr method.
I'm Tanaporn Songprayad is interested in studying Geospatial Data Processing and Data Engineering, focusing on learning and developing skills in data management and analysis to support the utilization of data in environmental work and spatial analysis.