11-05, 16:00–16:30 (America/New_York), Lake Fairfax
This presentation describes the transformation of workflows for hazard data aggregation and risk modelling from proprietary GIS software to FOSS. Our open-source workflows, leveraging Python libraries like Geopandas, Requests, Folium, and others, have greatly improved efficiency, transparency, and repeatability.
This presentation describes and illustrates the transformation of workflows for hazard data aggregation and risk modelling from proprietary GIS software to FOSS. The work, which begins with collecting geospatial data and ends with analyzing the performance of a predictive model on a series of points, requires contributions from several cross-collaborative teams with different levels of access and familiarity with geospatial software. A preliminary process for the research and development of the predictive model relied heavily on ArcGIS software on local machines, and it quickly presented itself as a challenge that collaborators in the model development had neither access to nor expertise with the proprietary software.
The result of the preliminary GIS work was typically in the form of static visualizations exported from ArcGIS Pro, accompanied by summary statistics as support. Efficiency via repeatability was the primary goal in redesigning this process, which could be addressed with either licensed solutions like ModelBuilder and ArcPy or free and open source software. We found that free and open source Python libraries provided similar functionality to the proprietary licensed Python solutions with the additional benefit of improved documentation and a wider user base online, which aided in troubleshooting errors.
Most of the highly manual data collection process evolved into a few compact functions making use of the Requests library, working in conjunction with BeautifulSoup to capture URLs from a host website or API endpoint and import data directly into a Python workflow. Geopandas can be used to hold that data in the form of a temporary geodataframe, and provides functionality for transformations, joins, buffers, and other operations similar to what is found in ArcGIS Pro. Pandas–Geopandas’s sibling for tabular data analysis–also has a massive catalog of functions for generating summary statistics and aggregating data. These capabilities, along with the familiarity that our non-geospatial collaborators have with Pandas, makes it easy to share our workflows and to make tweaks to the process as a team.
We also improved the visualization output component of this workflow in passing Geopandas geodataframes to Folium for the generation of live Leaflet maps for active exploration without licensed software. Having all of this housed within the common format of a Jupyter notebook makes it possible to send an entire workflow, from data scraping to visualization, to a collaborator without an Esri license. That collaborators can see on their own machine a) where the data is coming from, b) how it’s transformed, and c) how it looks spatially in any area they’d like to see is a significant evolution to a more efficient, transparent, and repeatable process.