Multistore: An S3-compliant data distribution API
2026-09-02 , Conference Management Room2

Source Cooperative's data proxy lets users access open datasets through S3-compatible tools. We rebuilt it from scratch as Multistore, an open source S3 gateway designed to be reusable across the ecosystem. This talk covers why we rebuilt, what we learned, and how others can adopt it.


Source Cooperative is an open data platform where researchers and organizations publish and share datasets. Its data proxy lets users access hosted data through familiar S3-compatible tools like aws-cli, boto3, obstore, GDAL, and DuckDB, with backends spanning AWS S3 and Azure Blob Storage. The proxy works, but it was built specifically for Source Cooperative and is difficult for others to adopt or extend.

Multistore is our effort to rebuild the data proxy as a modular, open source project that any organization can use. Rather than extracting and refactoring the existing proxy, we started fresh with a focus on clean interfaces and pluggable components. The result is an S3-compliant gateway that resolves incoming requests to the correct storage backend. Its zero-copy passthrough approach means the proxy never buffers file contents, keeping resource usage low and throughput high regardless of file size.

Beyond modularity, we used the rebuild as an opportunity to rethink deployment. The original runs on a small cluster of ECS Fargate nodes in a single AWS region. Multistore can compile to both native and WebAssembly targets, which lets us run a hybrid architecture: a Cloudflare Workers layer handles global edge routing to insure fast data access from across the globe, while regional servers handle heavier workloads that benefit from proximity to storage backends. This combination improves latency for users worldwide while keeping operational costs predictable.

Authentication moves from long-lived access keys to OIDC token exchange, supporting both interactive and machine-to-machine flows with temporary credentials. The system is organized around a set of pluggable interfaces for routing, authorization, credential storage, and backend I/O, so adopters can customize behavior without forking the project.

In this talk, we will walk through the limitations that motivated a ground-up rebuild, the architectural decisions we made along the way, and the tradeoffs involved in designing software that serves one platform's needs while remaining genuinely reusable. We will demo Multistore as it runs in Source Cooperative and discuss how other open data platforms, catalogs, and data repositories might integrate it into their own infrastructure.


Level of technical complexity: 3 - advanced Give indication of resources (video, web pages, papers, etc.) to read in advance, that will help get up to speed on advanced topics.:

https://developmentseed.org/multistore

I make my conference contribution available under the CC BY 4.0 license. The conference contribution comprises the abstract, the text contribution for the conference proceedings, the presentation materials as well as the video recording and live transmission of the presentation:

Anthony Lukach is a software engineer at Development Seed, where he builds open-source tools for geospatial data infrastructure. His work spans the eoAPI ecosystem, STAC-based access control, and cloud-native data platforms.