11-19, 11:00–11:25 (Pacific/Auckland), WA220
Obstore is the simplest, highest-throughput Python interface to Amazon S3, Google Cloud Storage, Azure Storage. This talk will explain what Obstore is, how it differs from existing Python libraries for cloud data access, and how it's being used to speed up cloud-based geospatial workflows.
Obstore is a Python library that abstracts how to access data on commercial cloud storage providers, like Amazon S3, Google Cloud Storage, and Azure Storage. Instead of writing code for each provider and manually creating the abstractions, use Obstore’s singular API for data access.
While at its face Obstore is similar to fsspec—they both provide abstracted interfaces to cloud storage—Obstore presents some core improvements:
- Minimal API with native synchronous and asynchronous support.
- Fast with no Python dependencies: obstore wraps the Rust
object_storelibrary, meaning that your Python environment stays small and you won’t face dependency conflicts. - Streaming downloads, uploads, and listings without manual pagination.
- Full type hinting for easier use in Python IDE environments.
- Simple access to NASA Earthdata and Microsoft Planetary Computer data collections with automatic credential refreshing when short-lived tokens expire.
While Obstore is a foundational technology that can be used across many domains, this talk will focus on its use in geospatial-related projects:
- Zarr-Python introduced an Obstore-based backend that can be 3x faster than the default fsspec-based backend when reading Zarr datasets.
- VirtualiZarr, a library to present non-cloud-native file formats like netCDF as virtual Zarr datasets, is being rewritten to use Obstore by default.
- Async-tiff, a fast, asynchronous, Python TIFF, GeoTIFF, and Cloud-Optimized GeoTIFF reader, uses Obstore under the hood to power its data fetching.
- A new Python GeoParquet library uses Obstore as well.
This talk will explain what Obstore is, how it differs from existing Python libraries, and how you might use it in your own projects to speed up your own data access.