Modern Geospatial Data Science in the Cloud with Nebari
12-05, 16:00–16:30 (America/Belem), Room II

The modern Python geospatial stack encompasses several tools and libraries that allow scientists and developers to write more efficient and scalable data science workflows, from data access and preparation, to analysis and visualization. It provides a great ecosystem for reading and writing cloud-optimized and chunked data formats, accessing data catalogs, handling labeled N-dimensional arrays, parallel and distributed computing, statistical analysis, machine learning, and interactive computing and plotting.

As data scientists increasingly work in teams and tackle bigger and more complex problems, there is a growing need for collaborative platforms that can support sophisticated workflows and large-scale data processing. However, platforms for effective collaboration still have significant challenges, including deployment, configuration, graceful scaling, and environment and dependency management. Addressing these challenges is not trivial and it often requires some DevOps expertise.

In this talk, we’ll introduce Nebari, a cloud-based open source data science platform built on top of Kubernetes, Dask and the Jupyter ecosystem. Nebari enables organizations to quickly deploy a collaborative platform on any of the major cloud providers. Once deployed, teams can easily access single-user Jupyter Notebook and VS Code servers from their web browsers and start writing and running reproducible and scalable geospatial data science workflows. Integrated with conda-store and Dask, it provides users not only the possibility to build, share and access conda environments from their servers, but also to launch short-lived clusters to handle their compute-intensive tasks.

We’ll demonstrate how Nebari can be leveraged to develop compute and data intensive applications in the cloud using packages from the modern Python geospatial stack. By the end, we hope to equip organizations with the tools and knowledge to promote better and more effective collaboration in geospatial data science. Organizations can choose to adopt Nebari as an out-of-the box platform for their teams, or use it as a blueprint for developing a custom platform built on top of open source libraries.

See also: Slides (2.4 MB)

Marcelo is an ecologist with a background in remote sensing. He has used several tools from the scientific and geospatial Python ecosystems to develop Earth Observation applications in forest monitoring, burned area estimation and flood mapping. He is currently a cloud infrastructure engineer at Quansight, where he maintains Nebari and enables people to scale their data workflows in the cloud.