Jakub Nowosad
I am a computational geographer working at the intersection between geocomputation and the environmental sciences. My research is focused on developing and applying spatial methods to broaden our understanding of processes and patterns in the environment. Vital part of my work is to create, collaborate, and improve geocomputational software. I am an active member of the #rspatial community and a co-author of the Geocomputation with R book.
Sessions
Spatial pattern is an inherent property visible in many spatial variables. Spatial patterns are often at the heart of many geographical studies, where we search for existing hot spots, correlations, and outliers. They may be exhibited in various forms, depending on the type of data and the underlying processes that generated the data. Here, we will focus on spatial patterns in spatial rasters, but the concept can be extended to other types of spatial data, including vector data and point clouds.
Patterns in spatial raster data may have many forms. We may think of spatial patterns for continuous rasters as an interplay between intensity and spatial autocorrelation (e.g., elevation) or between composition and configuration for categorical rasters (e.g., land cover) (Gustafson, 1998). Intensity relates to the range and distribution of values of a given variable, while spatial autocorrelation is a tendency for nearby values of a given variable to be more similar than those that are further apart. On the other hand, composition is the number of cells belonging to each map category, while configuration represents their spatial arrangement. Another distinction is between the data dimensionality. The most common situation is when we only use one layer of given data (e.g., an elevation map or a land cover product for one year). However, we may also be interested in sets of variables (layers, bands), such as hyperspectral data, time series, or proportions of classes. An additional special case is the RGB representation of the data.
Assessing the similarity of spatial patterns is a common task in many fields, including remote sensing, ecology, and geology. This procedure may encapsulate many types of comparisons: comparing the same variable(s) for different areas, comparing different datasets (e.g., different sensors), or comparing the same area but at different times.
Given various possible scientific questions and the fact that we have a plethora of forms of spatial data, there is no universal method for assessing similarity between two spatial patterns. The basic method is a visual inspection; however, it is highly subjective, both from the observer’s and visualization type’s perspectives. Fairly straightforward other approaches are to create a difference map, count changed pixels, or look at the distribution of the values. More advanced methods include the use of machine learning algorithms. However, these methods are often complex, require a lot of data, and are not always interpretable. An alternative and general approach, inpired by content-based image retrieval (Kato, 1992), is to use spatial signatures to represent spatial patterns and dissimilarity measures to compare them (Jasiewicz and Stepinski, 2013).
A spatial signature is any numerical representation (compression) of a spatial pattern. For a categorical raster, it can be a co-occurrence vector of classes in a local window, while for a time series, it may be a vector of values in a given cell. Then, having spatial signatures for both areas (sensors, moments), we can compare them using a dissimilarity measure (e.g., Euclidean distance, cosine similarity, etc.) (Cha, 2007). This approach can compare complex, multidimensional spatial patterns, but at the same time, it gives some degree of interpretability. It can also be further applied to many techniques of spatial data analysis, including spatial clustering (to find groups of areas with similar spatial patterns) and segmentation (to create regions with similar spatial patterns).
While the concept of applying spatial signature and dissimilarity measures is powerful, there are still many unresolved issues and questions to consider. It includes the topics of scale of comparison, input data resolution, dimensions, or types, used spatial signatures, and selected dissimilarity metrics. There is still a lack of studies that systematically compare different methods of assessing similarity between spatial patterns, or suggest good practices in their use. At the same time, a growing number of FOSS tools allows us to test various methods and apply them to real-life scenarios.
The goal of this work is to provide an overview of existing R packages for comparing spatial patterns. These include ‘motif’ (for comparing spatial signatures for categorical rasters; Nowosad, 2021), ‘spquery’ (allowing for comparing spatial signatures for continuous rasters), and ‘supercells’ (for segmentation of various types of spatial rasters based on their patterns; Nowosad and Stepinski, 2022). It will show how they can be applied in real-life cases and what their limitations are. This work also aims to open a discussion about the methods for assessing similarity between spatial patterns and their FOSS implementations.
References
Cha, S-H. (2007). Comprehensive Survey on Distance/Similarity Measures Between Probability Density Functions. Int. J. Math. Model. Meth. Appl. Sci.
Gustafson, E.J. (1998) Quantifying landscape spatial pattern: what is the state of the art? Ecosystems
Jasiewicz, J., & Stepinski, T. F. (2013). Example-Based Retrieval of Alike Land-Cover Scenes From NLCD2006 Database. IEEE Geoscience and Remote Sensing Letters, https://doi.org/10.1109/lgrs.2012.2196019
Kato, T. (1992) Database architecture for content-based image retrieval, Image Storage and Retrieval Systems, https://doi.org/10.1117/12.58497
Nowosad, J. (2021). Motif: an open-source R tool for pattern-based spatial analysis. Landscape Ecology, https://doi.org/10.1007/s10980-020-01135-0
Nowosad, J., & Stepinski, T. F. (2022). Extended SLIC superpixels algorithm for applications to non-imagery geospatial rasters. International Journal of Applied Earth Observation and Geoinformation, https://doi.org/10.1016/j.jag.2022.102935
In 2016 two early-career researchers met and discussed the lack of open-access materials related to spatial data analysis with vector and raster geo data in R. A few months later, they started writing a book together which, from the first commit onwards, was done in the open. The book source code was publicly available at GitHub, updated regularly, and reproduced on every commit by continuous integration. Due to this approach, it initially attracted several contributors, one of whom became an author. Writing the book using many FOSS tools allowed us to contribute suggestions, leading to dozens of improvements upstream. The first version of Geocomputation with R (abbreviated to ‘geocompr’) was completed and published in early 2019.
‘Geocompr', started as a two-person book project. However, it not only attracted many readers, but also enabled online discussion through online platforms, such as GitHub and social media. In the last few years, the book has had a few hundred thousand readers online, gained a few official and community translations, and has been used in many academic courses and research papers. We also started working on its second edition and its sibling project: Geocomputation with Python.
It became clear that the 'geocompr' name was no longer appropriate for the more multilingual nature of the project, and we started using the 'geocompx' name. We hope it captures the essence of the project: eXchanging information about geocomputation, cross (X) pollination of ideas from one programming language to another, and the possibility of hosting additional content on geocomputation with (X) other languages.
Currently, the main entry point for this project is the https://geocompx.org website. It contains links to other books and materials and also hosts a blog with posts related to geocomputation, which is also open to guest writers. The 'geocompx' project is also a Discord server with discussions about various FOSS4G topics, from tools and methods to applications to solve real-life problems.
In this talk, we will share our experiences of writing an open access book, show the tools we use, and provide suggestions on how to start to contribute or create FOSS4G materials on your own.