FOSS4G 2022 general tracks

Lars Opsahl

Developer in SQL, Java and related tools for many years. Is working with projects like https://gitlab.com/nibioopensource/pgtopo_update_sql and https://github.com/larsop/resolve-overlap-and-gap.


Sessions

08-25
10:10
5min
Data integrity risks when using simple feature
Lars Opsahl

When you care about data integrity of spatial data you need to know about the limitations/weaknesses of using simple feature datatype in your database. For instance https://land.copernicus.eu/pan-european/corine-land-cover/clc2018 contains 2,377,772 simple features among which we find 852 overlaps and 1420 invalid polygons. For this test I used “ESRI FGDB” file and gdal for import to postgis. We find such minor overlaps and gaps quite often, which might not be visible for the human eye. The problem here is that it covers up for real errors and makes difficult to enforce database integrity constraints for this. Close parallel lines also seems to cause Topology Exception in many spatial libraries.

A core problem with simple features is that they don't contain information about the relation they have with neighbor features, so integrity of such relations is hard to constraint. Another problem is mixing of old and new data in the payload from the client. This makes it hard and expensive to create clients, because you will need a full stack of spatial libraries and maybe a complete locked exact snapshot of your database on the client side. Another thing is that a common line may differ from client to client depending on spatial lib, snapTo usage, tolerance values and transport formats.

In 2022 many system are depending on live updates also for spatial data. So it’s big advantage to be able to provide a simple and “secure” API’s with fast server side integrity constraints checks that can be used from a standard web browser. When we have this checks on server side we will secure the equal rules across different clients.

Is there alternatives that can secure data integrity in a better way? Yes, for instance Postgis Topology. The big difference is that Postgis Topology has more open structure that is realized by using standard database relational features. This lower the complexity of the client and secures data integrity. In the talk “Use Postgis Topology to secure data integrity, simple API and clean up messy simple feature datasets.” we will dive more into the details off Postgis Topology
Building an API for clients may be possible using simple features, but it would require expensive computations to ensure topological integrity but to solve problem with mixing of new and old borders parts can not be solved without breaking the polygon up into logical parts. Another thing is attribute handling, like if you place surface partly overlapping with another surface should that have an influence on the attributes on the new surface.

We need to focus more on data integrity and the complexity and cost of creating clients when using simple feature, because the demands for spatial data updated in real time from many different clients in a secure and consistent way will increase. This will be main focus in this talk.

Use cases & applications
Room 9
08-25
12:00
30min
Postgis Topology to secure data integrity, simple API and clean up messy simple feature datasets.
Lars Opsahl, sandro santilli, Mattia Natali

In Postgis Topology a merge of two surfaces does not involve spatial operations, since
the surface to border relation has foreign key structures in the database. This means that the border of the new object is spatially not touched/changed when two surfaces are merged. With simple feature the common border must be computed on the fly, which again may involve snapTo and cause tiny overlaps and gaps.

With Postgis Topology you can easily make an API where the client only sends new borders which is a key issue to secure data integrity. This secures that old border are not are not moved by a client error or the by simple transport format, because existing points are never not passed back to the server. Postgis Topology makes it easy for the server to work with those new borders(delta), because there are standard methods for this in Postgis Topology and all relations between border and surfaces are stored in the database. Postgis Topology also has validation routines in addition to using standard database constraints to secure a healthy system.

The principles that Postgis Topology is based on was used in spatial system many years ago, but one problem was to keep the border line work nice and clean and not end up in a spaghetti. So one of the first things we did together with Sandro Santilli was to create methods on top of Postgis Topology to avoid this, by throwing away any border parts that does not contribute to a new “valid” surface.

Postgis Topology is built on a relational database model that is based on SQL-MM part 3. Your own domain data are easily linked to border, surface objects with more. For instance to check domain attributes on a surface on the other side of a border is not spatial query but a standard relational query.

The following projects will also be touched in this talk:

https://gitlab.com/nibioopensource/pgtopo_update_sql (Functions using Postgis Topology to make it easy to create spatial update clients.)

https://github.com/strk/qgis_pgis_topoedit (Postgis Topology is very well integrated with QGIS.)

https://github.com/larsop/resolve-overlap-and-gap (Show how we clean up, simplify, generalize simple feature tables with millions of rows using Postgis Topology)

Is relational database structure a good choice for Postgis Topology? Yes I will mean and since it’s also linked up SQL-MM part3 and not a random private structure and with all great Postgis functions available this is very good combination. You may take take glance at https://www.ibm.com/ibm/history/ibm100/us/en/icons/reldb/ and other articles about relational databases.

The plan now is to build a full ecosystem around Postgis Topology with a generic client to support declarative rules, where you can define attributes, rules for attribute handling and how to deal with overlap and gap.

All the work NIBIO has done/is doing here would not have been possible with out the great support from Sandro Santilli.

Use cases & applications
Room Limonaia