FOSS4G NA 2024

Environment setup and predictive modeling
09-09, 09:00–12:00 (America/Chicago), Missouri S&T

In this workshop we will explore how to set up a geospatial data science environment, use that environment to create a well spread and balanced sample, and estimate canopy cover from data derived from STAC, OSM, and REST services.


Remotely sensed data, newer technologies, and modeling frameworks are fundamentally changing the way in which we understand and manage resources. In this short course, we will discuss how to setup a data science processing environment using Conda, build that processing environment using basic user permissions, demonstrate how to acquire raster and vector data from STAC, OSM, and REST services, illustrate how to use those data to create a well spread and balanced sample, create an ensemble of KNN models, and finally depict spatially explicit estimates of mean canopy cover and modeling error, locally and within cloud services using opensource Jupyter Notebooks and Python. Our notebook and use case walks analysts through the basic steps needed to create a well spread and balanced sample, integrating field data with remotely sensed data using machine learning and Raster Tools, and further illustrate how analysts can standardize this type of workflow using open-source data streams and software. Workshop learning objectives include: 1) learning about environments, 2) learn multiple sample designs strategies, 3) learn how to access cloud-based data and services, 4) learn the advantages of a well spread and balanced sample, 5) create an ensemble of predictive models, and 6) explore how to use those models to depict mean and standard error estimates in a spatially explicit manner.