-
Notifications
You must be signed in to change notification settings - Fork 30
Proposal on Node Expressions for SHACL Core 1.2
There are plenty of use cases where the value of some properties (should) depend on other properties, or other assertions in the data graph. These property values may be used for constraint checking, for display purposes or for querying. For example, TopBraid's GraphQL endpoint can return such derived properties when requested, but their triples are never asserted. Since TopBraid's forms depend on GraphQL, they can also display derived properties, computed on demand.
Examples taken from TopBraid EDG built-in ontologies, in SHACL-AF 1.1 syntax
edg:Database-tableCount
a sh:PropertyShape ;
sh:path edg:tableCount ;
sh:datatype xsd:integer ;
sh:description "The number of tables in this database, automatically computed." ;
sh:values [
sh:count [
sh:path [
sh:inversePath edg:tableOf ;
] ;
] ;
] .
metash:PropertyShape-isInferred
a sh:PropertyShape ;
sh:path metash:isInferred ;
sh:datatype xsd:boolean ;
sh:description "True if this property has a property values rule (using sh:values). Such properties are typically called 'inferred' and computed on the fly where needed." ;
sh:values [
sh:exists [
sh:path sh:values ;
] ;
] .
edg:Asset-downstreamConsumers
a sh:PropertyShape ;
sh:path edg:dependent ;
sh:class edg:Asset ;
sh:description "Aggregates the computed dependents of an interoperable resource. This includes downstream resources that are expressed as provenance." ;
sh:values [
sh:prefixes <http://edg.topbraid.solutions/1.0/schema/technical-assets> ;
sh:select """
SELECT DISTINCT ?dependency
WHERE {
?property (rdfs:subPropertyOf)* prov:wasDerivedFrom .
?dependency ?property $this .
?dependency a ?dependencyClass .
?dependencyClass (rdfs:subClassOf)+ prov:Entity.
} """ ;
] .
Currently SHACL only support a few hard-coded target types: sh:targetClass, sh:targetSubjectsOf, sh:targetObjectsOf, sh:targetNode. Quite often, shapes should apply to a finer-grained subset of target nodes.
Due to lack of time, the 1.0 WG did not finish filter shapes that would have allowed to narrow down the target nodes of a shape.
SHACL-AF defines SPARQL-based target types, and they'll likely get added to SHACL-SPARQL 1.2 anyway.
Some tickets already request richer targets, e.g.
- #213 Add targetShape
- #168 Ability to target direct instances of a class
Currently all constraint parameters are constants. For example sh:minCount must be a single xsd:integer such as 1.
Sometimes constraints should only apply to certain target nodes, or have a dynamic selection of parameters that are computed based on the focus node.
Example 1 (see also https://datashapes.org/dynamic.html#example-path): "The values of ex:state must be one of the allowed codes depending on the value of the property ex:country, e.g. "If the Address is inside of Australia then the valid country codes are ACT, NSW, NT, QLD, SA, TAS, VIC and WA".
ex:Address-state
a sh:PropertyShape ;
sh:path ex:state ;
sh:in [
sh:path ( ex:country ex:stateCode )
] .
Example 2: Employees in the US must have a SSN:
ex:Person-ssn
a sh:PropertyShape ;
sh:path ex:ssn ;
sh:datatype xsd:string ;
sh:minCount [
sh:if [
sh:property [
sh:path ex:country ;
sh:hasValue ex:USA ;
]
] ;
sh:then 1 ;
sh:else 0 ;
]
(This particular example is not necessarily a best modeling practice but used to illustrate the potential syntax).
SHACL already has a light-weight inferencing mechanism for derived properties built-in: Property paths. For example, transitive properties can be inferred by walking up a + or * path:
skos:Concept-transitiveBroader
a sh:PropertyShape ;
sh:path [
sh:oneOrMorePath skos:broader ;
] ;
sh:class skos:Concept ;
sh:name "all parent concepts" .
which can then be, for example, rendered on a form. Such complex paths are fundamentally different from simple IRI predicate paths and sh:inversePath with an IRI property, because the latter can be edited easily while for complex paths it is difficult/impossible to guess which triples need to be asserted when a value is entered. For example, with sh:oneOrMorePath there could be any number of intermediate triples. Therefore, such properties are generally "read-only" like an inference that is derived from the explicitly asserted triples.
Property paths use blank nodes with some key properties that identify the kind of path:
- sh:alternativePath identifies "Alternative paths"
- sh:inversePath identifies "Inverse paths"
- rdf:first identifies "Sequence paths"
This design can be generalised to allow for arbitrary other computations, vastly extending the expressiveness. A starting point could be the Node Expressions from the SHACL-AF 1.1 draft: https://w3c.github.io/shacl/shacl-af/#node-expressions
Introduction to Node Expressions: https://www.linkedin.com/pulse/inferencing-shacl-using-shvalues-holger-knublauch-0metf/?trackingId=%2FdYdNGciQ3m21yHrZJ0bXA%3D%3D
Some tickets already request richer path expressions:
- #195 Support for the {n,m} form for property paths
- #182 Negated property paths
The current fallback for anything that is not covered by Core is SPARQL. But not everybody knows SPARQL, and SPARQL queries are difficult to process and optimize as they are just strings, not symbols.
Introduce the concept of Node Expression as an algorithm that takes one or more input parameters plus an implicit focus node and produces a stream of output value nodes. Define Property Expression as a kind of Node Expression where the only difference is the interpretation of constant URIs, like in the property at sh:path. Change the definitions of sh:inversePath etc slightly to be Property Expressions.
The library of other supported Node Expressions can then be left to a separate document, esp the Inferences spec and therefore would not steal any WG resources. Core delivery would NOT be delayed.
Eventually, the library of Node Expressions could cover all of SPARQL, providing a declarative syntax similar to what SPIN used to be.
Vendors can also introduce their own Node Expression types for their specific needs, like our dash:js to support inline JavaScript.
This means that all existing sh:path constraints will continue to work, but we open up many more use cases including SPARQL queries (sh:select) and things like sh:if/then/else and sh:filterShape.
Example:
g:City-fillColor
a sh:PropertyShape ;
sh:path g:fillColor ;
sh:datatype xsd:string ;
sh:name "fill color" ;
sh:path [
sh:if [
sh:exists [
sh:path [
sh:inversePath g:capital ;
] ;
] ;
] ;
sh:then "red" ;
sh:else "blue" ;
] .
When properties are essentially inferred (using a complex path or node expression) it is sometimes still beneficial to have a named property for those values. Instead of sh:values, this could be achieved via:
edg:Database-tableCount
a sh:PropertyShape ;
sh:inferredProperty edg:tableCount ; # NEW
sh:path [
sh:count [
sh:inversePath edg:tableOf ;
] ;
] .
sh:datatype xsd:integer ;
sh:description "The number of tables in this database, automatically computed." ;
This means that when inference is executed, the focus node will receive values for the predicate edg:tableCount. This value can then be accessed by other node expressions using [ sh:path edg:tableCount ] (or a new property instead of sh:path) implementing a simple form of backward chaining.
Note that these inferred values do not need to be materialized as triples, and instead be computed on-demand. This means they are always up to date (truth maintenance), but are by default invisible to other graph operations such as SPARQL queries. We need to set the correct expectations on that in the user base.
This will allow the computation of flexible targets.
Note: The current Node Expressions design assumes that a focusNode exists. For target computation, there is no such thing, so it would need to be explicitly passed in (using sh:nodes) or we add another property that defines the target class to begin with.
For most constraint parameters, allow any Node Expression that evaluates to a given datatype or IRI. For example, sh:minCount currently takes an xsd:integer, and it would need to be generalized to allow any Node Expression that produces an xsd:integer.
Optionally, Node Expressions could be allowed in other places such as sh:deactivated (we do this already in TopBraid). Once SHACL engines have implemented support for Node Expressions these become low-hanging fruits to implement across the board.
- sh:path
- (add a new property) sh:inferredProperty/sh:derivedProperty/sh:aliasProperty
- sh:targetNode
- all constraint parameters such as sh:minCount and sh:in could accept node expressions
If we elect to implement one or more of these changes, SHACL 1.2 will have a very significant gain in expressiveness. Also SHACL would be future-proofed because the library of Node Expressions can be extended through separate specs.
A major downside is that the additional expressiveness makes SHACL less predictable to static analysis. For example, form generators can no longer easily determine whether a field has sh:maxCount 1 or not, if that value is a node expression. These algorithms would need to be changed to assume that no constraint exists, or pre-compute the value for the given focus node.
Another downside is the learning curve, as the added expressiveness may look alien to beginners.
Like any gain in expressiveness, there will be corner cases where engines may "crash" due to infinite recursion etc. We did leave such cases undefined in the past and it hasn't really impacted the acceptance of SHACL except for theoretical pureness. Furthermore, many other languages including all programming languages also support recursion and allow infinite loops etc. In my (Holger's) opinion, SHACL should be a language that gets work done, without being artificially limited by theoretical corner cases.
Defining a SHACL Profile without any node expressions, can help address these problems.
(Page started by @HolgerKnublauch, following discussions on https://github.com/w3c/data-shapes/issues/215)