09-11, 16:00–16:30 (America/Chicago), Grand F
Putting data in a bucket, making it public, generated metadata, publishing an API is not enough. Come discuss the next steps in making data truly accessible and usable for analysts, scientists, and all users.
This is our story, the story of the open-source and open-data community. We’re trying to get the data into the hands of all the people who need it to solve whatever real-world issue they’re working on—climate change, agriculture, humanitarian missions, biodiversity loss, urban resilience, and more.
We’ve made progress over the years with ISO metadata standards, OGC services specifications, Cloud Optimized formats, CSW, and STAC APIs. Plus, lots of Foss4g projects that implement all these solutions. However, there are still some significant roadblocks between data providers and data users who are not software and web developers. The data has gotten too big to search and download by hand or even download at all. If you want to move to the cloud, you must become a DevOps specialist to deploy Pangeo+JupyterHub or Rocker+Rstudio containers to the right place in the cloud near the data. Then you have to interact with data - we’ve got some great libraries (i.e. gdal/ogr, QGIS STAC plugin, pystac-client, rstac, etc) but these only get you the basics of searching and sometimes opening data. What happens when it’s an authenticated data source or when you must cache your cloud-optimized read to make your analysis repeatable? How about support for queryable STAC properties in extensions? You can see we’re at a point where there needs to be more investment in the last-mile clients to make them easier to use.
There have been many achievements along this journey, but there are still a few more to go. This talk is about exploring that history, learning lessons, identifying new challenges, and discussing ways forward for the community to enable better access to data for all.