11-05, 11:30–12:00 (America/New_York), Lake Fairfax
We reproduced three geographic studies with open geographic information science research practices and R as projects in a methods seminar. We produced open and reproducible research compendia and reports on reproducibility and validity of the three studies.
Open geographic information science research practices can be applied to reproduce prior studies as project-based learning opportunities for undergraduate students to learn new methods and improve upon prior research. We report on the structure and results of an advanced geographic methods seminar in which we reproduced studies of geographic patterns of crime in Connecticut using geographically-weighted regression (Meng 2021), redlining and present-day residential water infrastructure in United States urban areas using binary logistic regression (Sterling et al. 2023), and heat exposure at carceral facility locations in the contiguous United States using spatial joins and linear regressions (Tuholske et al. 2024).
The open science framework is designed to improve research transparency in order to improve research quality, access, and efficiency. For geographic research, we use a standardized research compendium template with version control to organize project metadata, data, procedures, code in computational notebooks using open-source software, preregistered analysis plans, and post-analysis living reports. We document analysis plans with the research design prior to analyzing data in order to control sources of researcher bias and improve understanding of the research problem. We publish and update living reports afterward to make results and changes to the research design accessible and transparent. For open geographic information science studies, it should be possible for researchers and students to conduct reproduction studies in which they repeat the same procedures with the same data and confirm the findings with the same results.
In the research seminar, we learned open science workflows with the R and GitHub platforms in four scaffolded stages. First, we learned basics of Git version control and Markdown language by creating simple Jekyll websites on GitHub. Second, we practiced using the template research compendium and open science workflows for a GIS lab on gerrymandering. Third, we repeated a demonstration reproduction study of COVID-19 and disability using the same template. Finally, we undertook a reproduction study of their own for the second half of the seminar.
Three teams of two students each completed reproduction studies through four project stages. First, we searched literature for reproduction study candidates and selected a single study aligned with their thematic and methodological interests to reproduce. Second, we closely read the study and its supplementary materials and researched its data sources. We initialized a research compendium and completed its project metadata, data source metadata, and analysis plan in an Rmarkdown computational notebook, all prior to analyzing data. Third, we added R code blocks to the analysis plan notebook to implement the study and produce resulting statistics, tables, and figures. While implementing the study, we documented “unplanned deviations” whenever we had to adjust research design decisions due to ambiguities or inconsistencies with the original study or data. Finally, we rendered and published analysis reports in our GitHub repository webpages and peer-reviewed the legibility, accuracy, completeness, and functionality of our research compendia.
Each reproduction study presented substantial challenges and learning opportunities, varying in availability of data, code, and methodological details. Meng and Sterling et al. described data sources and research design in their articles, but did not provide data or code. Tuholske et al wrote insufficiently about data and methods in their short article narrative, but provided supplementary materials with additional details and a GitHub repository with data and code. In our attempts to reproduce the studies, we encountered challenges with large data volumes, use of old data or code libraries, and ambiguities in methodological details and organization of supplementary materials. We identified uncertainties and threats to validity rooted in boundary effects, scale effects, construction of indicators, selection bias, and treatment of data with zeros, geometry errors, or missing data. Our findings have direct implications for improving both reproducibility and the quality of research design and reporting.
Overall, we reproduced substantive portions of each study by writing and modifying spatial R code. We created public reproducible research compendiums for each reproduction study, and critically reviewed the study research designs for important sources of uncertainty and geographic threats to validity. Pedagogically, the reproduction studies were opportunities to apply project-based learning to authentic challenges contributing to open science in the geographic information science community.
- Meng 2021, DOI:10.5719/hgeo.2021.152.5
https://github.com/opengisci/RPr-Meng-2021 - Sterling et al 2023, DOI:10.1038/s41893-024-01293-y
https://github.com/opengisci/RPr-Sterling-2023 - Tuholske et al. 2024, DOI:10.1038/s41893-024-01293-y
https://github.com/opengisci/Rpr-Tuholske-2024
Matthew Mills is a Geography student graduating from Middlebury College in February 2026. They are interested in critical geography and GIS, geospatial data analysis, and open-source and participatory research.