FOSS4G 2022 general tracks

Data integrity risks when using simple feature
2022-08-25, 10:10–10:15 (Europe/Rome), Room 9

When you care about data integrity of spatial data you need to know about the limitations/weaknesses of using simple feature datatype in your database. For instance https://land.copernicus.eu/pan-european/corine-land-cover/clc2018 contains 2,377,772 simple features among which we find 852 overlaps and 1420 invalid polygons. For this test I used “ESRI FGDB” file and gdal for import to postgis. We find such minor overlaps and gaps quite often, which might not be visible for the human eye. The problem here is that it covers up for real errors and makes difficult to enforce database integrity constraints for this. Close parallel lines also seems to cause Topology Exception in many spatial libraries.

A core problem with simple features is that they don't contain information about the relation they have with neighbor features, so integrity of such relations is hard to constraint. Another problem is mixing of old and new data in the payload from the client. This makes it hard and expensive to create clients, because you will need a full stack of spatial libraries and maybe a complete locked exact snapshot of your database on the client side. Another thing is that a common line may differ from client to client depending on spatial lib, snapTo usage, tolerance values and transport formats.

In 2022 many system are depending on live updates also for spatial data. So it’s big advantage to be able to provide a simple and “secure” API’s with fast server side integrity constraints checks that can be used from a standard web browser. When we have this checks on server side we will secure the equal rules across different clients.

Is there alternatives that can secure data integrity in a better way? Yes, for instance Postgis Topology. The big difference is that Postgis Topology has more open structure that is realized by using standard database relational features. This lower the complexity of the client and secures data integrity. In the talk “Use Postgis Topology to secure data integrity, simple API and clean up messy simple feature datasets.” we will dive more into the details off Postgis Topology
Building an API for clients may be possible using simple features, but it would require expensive computations to ensure topological integrity but to solve problem with mixing of new and old borders parts can not be solved without breaking the polygon up into logical parts. Another thing is attribute handling, like if you place surface partly overlapping with another surface should that have an influence on the attributes on the new surface.

We need to focus more on data integrity and the complexity and cost of creating clients when using simple feature, because the demands for spatial data updated in real time from many different clients in a secure and consistent way will increase. This will be main focus in this talk.

Developer in SQL, Java and related tools for many years. Is working with projects like https://gitlab.com/nibioopensource/pgtopo_update_sql and https://github.com/larsop/resolve-overlap-and-gap.

This speaker also appears in: