11-18, 09:00–12:00 (Pacific/Auckland), WF510
This workshop is built around guided computational analyses of two particular climate risk scenarios—wildfires & floods. The goal is to support independent geospatial explorations using publicly available data products from NASA Earthdata Cloud and FOSS with Pythonic APIs on a cloud computing environment.
Predicting and managing environmental risks of various climate-related disasters—e.g., wildfires, drought, and floods—is challenging and critical worldwide. Part of the difficulty is that historical norms (e.g., from last century) for the frequency of such extreme climate events are no longer sufficient to infer the frequency of future disasters. These natural risks are intrinsically linked to the dynamic distributions—varying both temporally and spatially—of surface water, precipitation, vegetation, and land use. These distributions can be modelled for forecasting and analysis (enabling quantification of these environmental risks) using hundreds of petabytes of relevant Earth science data available through NASA's Earthdata Cloud. With the dramatic growth in the availability of such data, today's earth scientists benefit from a strong understanding of open science practices and of cloud-based data intensive computing that enable reproducibly analyzing and assessing changing risk profiles.
This workshop provides hands-on examples of using cloud-based infrastructure and data products from NASA Earthdata Cloud for the analysis of environmental risk scenarios. This involves constructing quantitative estimates of changes in hydrological water mass balance over various defined geographical regions of interest and time windows. The goal is to build enough familiarity with generic cloud-based Jupyer/Python workflows and with remote-sensing data to enable adapting and remixing examples for other region-specific contexts. The workshop's design reinforces best practices of data-proximate computing and of reproducibility (as supported by NASA's Open Science and Transform to Open Science (TOPS) initiatives).
Participants are expected be familiar with raster data and common geospatial data conventions. Ideally, they are comfortable using a shell or a command-line interface to interact with data & programs. They should also be comfortable using common scientific Python libraries (e.g., NumPy, Pandas) and related Python data structures (e.g., tuples, dicts, lists, NumPy arrays, Pandas dataframes). There is a brief overview of Xarray, Hvplot, & Geoviews; prior exposure to those Python libraries is useful but not mandatory. Prior experience using Jupyter notebooks and writing short snippets of Python code is helpful.
Approximate schedule:
- minute 0-19: Introduction & Setup (logging in, configuring NASA Earthdata credentials)
- minute 20-29: Reminders about GIS prerequisites: coordinate systems, data formats (if required)
- minute 30-49: Overview of PyData tools for geographic data: Rasterio & Xarray (if required)
- minute 50-59: Break
- minute 60-79: Overview of PyData visualisation tools: Hvplot & Geoviews (if required)
- minute 80-99: Using NASA Earthdata Products (DIST, DWSx)
- minute 100-109: Using PyStac for retrieving data
- minute 110-119: Break
- minute 120-144: Case study: wildfires
- minute 145-169: Case study: flooding
- minute 170-179: Wrap-up
The workshop starts by getting participants logged into the cloud infrastructure and verifying their NASA Earthdata Cloud credentials. This is followed by a quick, non-comprehensive overview of GIS prerequisites and Python approaches to manipulating and visualizing geospatial data. The schedule above will be adapted to suit the audience needs (i.e., by increasing or decreading time allocated in each section as appropriate).
The hands-on case studies rely on the OPERA (Observational Products for End-Users from Remote Sensing Analysis) suite of data products; in particular, they use two particular categories of data products: DSWx (Dynamic Surface Water Extent) and DIST (Land Surface Disturbance). The workflows presented extend notebook examples drawn from the extensive OPERA Applications repository.
This workshop—co-developed by MetaDocencia & 2i2c—is part of NASA's Open Science and Transform to Open Science (TOPS) initiatives. An important goal is to reinforce principles of reproducibility and open science-based workflows (as exemplified in TOPS OpenCore, the introductory suite of open science curricula including Open Science 101).
Dhavide Aruliah has been teaching & mentoring both in academia and in industry for three decades. His career has grown around bringing learners from where they are to where they need to be mathematically & computationally. He was a university professor (Applied Mathematics & Computer Science) at Ontario Tech University before moving to industry where he oversaw training programs supporting the PyData stack at Anaconda Inc. and later at Quansight LLC. He has taught over 40 undergraduate- & graduate-level courses at five Canadian universities as well as numerous Software Carpentry & PyData tutorial workshops. Video examples of his teaching include: