11-17, 13:30–16:30 (Pacific/Auckland), WF511
In this workshop, participants will learn to automate geospatial data processing using Apache Airflow, focusing on designing workflows, integrating geospatial data, and leveraging tools/libraries like GeoPandas, GDAL, MinIO, and python debugger. The session will emphasize efficiency and reproducibility, enabling participants to create reliable and repeatable geospatial data pipelines.
Tools & Technologies
- Apache Airflow (Workflow management)
- GeoPandas / GDAL (Geospatial data processing libraries)
- MinIO (S3 compatible object storage)
- debugpy (Debugger for Python)
- GeoNetwork (Geospatial data catalog)
Key Takeaways
- Deploy and manage services MinIO and Apache Airflow with docker container
- Automate geospatial data processing workflows using Apache Airflow
- Ensure reproducibility in geospatial data pipelines for consistent results
- Store and manage geospatial data efficiently with MinIO
- Debug and troubleshoot Python code in geospatial workflows using debugpy
- Organize and catalog geospatial datasets with GeoNetwork (if time permits)
Data Engineer (Geospatial) at Meridia Land, leveraging satellite data and other geospatial data layers to verify field data for assessing EUDR compliance and unlocking valuable insights. He is responsible for processing raw data into standardized, production-ready data.