Skip to content

Proposal on Node Expressions for SHACL Core 1.2

Holger Knublauch edited this page Feb 15, 2025 · 10 revisions

Motivation and Use Cases

Derived properties

There are plenty of use cases where the value of some properties (should) depend on other properties, or other assertions in the data graph. These property values may be used for constraint checking, for display purposes or for querying. For example, TopBraid's GraphQL endpoint can return such derived properties when requested, but their triples are never asserted. Since TopBraid's forms depend on GraphQL, they can also display derived properties, computed on demand.

Examples taken from TopBraid EDG built-in ontologies, in SHACL-AF 1.1 syntax

edg:Database-tableCount
    a sh:PropertyShape ;
    sh:path edg:tableCount ;
    sh:datatype xsd:integer ;
    sh:description "The number of tables in this database, automatically computed." ;
    sh:values [
        sh:count [
            sh:path [
                sh:inversePath edg:tableOf ;
            ] ;
        ] ;
    ] .
metash:PropertyShape-isInferred
    a sh:PropertyShape ;
    sh:path metash:isInferred ;
    sh:datatype xsd:boolean ;
    sh:description "True if this property has a property values rule (using sh:values). Such properties are typically called 'inferred' and computed on the fly where needed." ;
    sh:values [
        sh:exists [
            sh:path sh:values ;
        ] ;
    ] .
edg:Asset-downstreamConsumers
    a sh:PropertyShape ;
    sh:path edg:dependent ;
    sh:class edg:Asset ;
    sh:description "Aggregates the computed dependents of an interoperable resource. This includes downstream resources that are expressed as provenance." ;
    sh:values [
        sh:prefixes <http://edg.topbraid.solutions/1.0/schema/technical-assets> ;
        sh:select """
            SELECT DISTINCT ?dependency
            WHERE {
                ?property (rdfs:subPropertyOf)* prov:wasDerivedFrom .
                ?dependency ?property $this .
                ?dependency a ?dependencyClass .
                ?dependencyClass (rdfs:subClassOf)+ prov:Entity.
            }  """ ;
    ] .

Complex Targets

Currently SHACL only support a few hard-coded target types: sh:targetClass, sh:targetSubjectsOf, sh:targetObjectsOf, sh:targetNode. Quite often, shapes should apply to a finer-grained subset of target nodes.

Due to lack of time, the 1.0 WG did not finish filter shapes that would have allowed to narrow down the target nodes of a shape.

SHACL-AF defines SPARQL-based target types, and they'll likely get added to SHACL-SPARQL 1.2 anyway.

Some tickets already request richer targets, e.g.

  • #213 Add targetShape
  • #168 Ability to target direct instances of a class

Dynamic Constraints

Currently all constraint parameters are constants. For example sh:minCount must be a single xsd:integer such as 1.

Sometimes constraints should only apply to certain target nodes, or have a dynamic selection of parameters that are computed based on the focus node.

Example 1 (see also https://datashapes.org/dynamic.html#example-path): "The values of ex:state must be one of the allowed codes depending on the value of the property ex:country, e.g. "If the Address is inside of Australia then the valid country codes are ACT, NSW, NT, QLD, SA, TAS, VIC and WA".

ex:Address-state
    a sh:PropertyShape ;
	sh:path ex:state ;
	sh:in [
		sh:path ( ex:country ex:stateCode )
	] .

Example 2: Employees in the US must have a SSN:

ex:Person-ssn
    a sh:PropertyShape ;
    sh:path ex:ssn ;
    sh:datatype xsd:string ;
    sh:minCount [
        sh:if [
            sh:property [
                sh:path ex:country ;
                sh:hasValue ex:USA ;
            ]
        ] ;
        sh:then 1 ;
        sh:else 0 ;
    ]

(This particular example is not necessarily a best modeling practice but used to illustrate the potential syntax).

Observations

SHACL already has a light-weight inferencing mechanism for derived properties built-in: Property paths. For example, transitive properties can be inferred by walking up a + or * path:

skos:Concept-transitiveBroader
    a sh:PropertyShape ;
    sh:path [
        sh:oneOrMorePath skos:broader ;
    ] ;
    sh:class skos:Concept ;
    sh:name "all parent concepts" .

which can then be, for example, rendered on a form. Such complex paths are fundamentally different from simple IRI predicate paths and sh:inversePath with an IRI property, because the latter can be edited easily while for complex paths it is difficult/impossible to guess which triples need to be asserted when a value is entered. For example, with sh:oneOrMorePath there could be any number of intermediate triples. Therefore, such properties are generally "read-only" like an inference that is derived from the explicitly asserted triples.

Property paths use blank nodes with some key properties that identify the kind of path:

  • sh:alternativePath identifies "Alternative paths"
  • sh:inversePath identifies "Inverse paths"
  • rdf:first identifies "Sequence paths"

This design can be generalised to allow for arbitrary other computations, vastly extending the expressiveness. A starting point could be the Node Expressions from the SHACL-AF 1.1 draft: https://w3c.github.io/shacl/shacl-af/#node-expressions

Introduction to Node Expressions: https://www.linkedin.com/pulse/inferencing-shacl-using-shvalues-holger-knublauch-0metf/?trackingId=%2FdYdNGciQ3m21yHrZJ0bXA%3D%3D

Some tickets already request richer path expressions:

  • #195 Support for the {n,m} form for property paths
  • #182 Negated property paths

The current fallback for anything that is not covered by Core is SPARQL. But not everybody knows SPARQL, and SPARQL queries are difficult to process and optimize as they are just strings, not symbols.

Suggested Changes to SHACL Core

Define the concept of Node Expressions

Introduce the concept of Node Expression as an algorithm that takes one or more input parameters plus an implicit focus node and produces a stream of output value nodes. Define Property Expression as a kind of Node Expression where the only difference is the interpretation of constant URIs, like in the property at sh:path. Change the definitions of sh:inversePath etc slightly to be Property Expressions.

The library of other supported Node Expressions can then be left to a separate document, esp the Inferences spec and therefore would not steal any WG resources. Core delivery would NOT be delayed.

Eventually, the library of Node Expressions could cover all of SPARQL, providing a declarative syntax similar to what SPIN used to be.

Vendors can also introduce their own Node Expression types for their specific needs, like our dash:js to support inline JavaScript.

Generalize sh:path to allow any Node Expression

This means that all existing sh:path constraints will continue to work, but we open up many more use cases including SPARQL queries (sh:select) and things like sh:if/then/else and sh:filterShape.

Example:

g:City-fillColor
    a sh:PropertyShape ;
    sh:path g:fillColor ;
    sh:datatype xsd:string ;
    sh:name "fill color" ;
    sh:path [
        sh:if [
            sh:exists [
                sh:path [
                    sh:inversePath g:capital ;
                ] ;
            ] ;
        ] ;
        sh:then "red" ;
        sh:else "blue" ;
    ] .

Node Expression as sh:path Example

Add a property alias mechanism sh:inferredProperty or sh:alias

When properties are essentially inferred (using a complex path or node expression) it is sometimes still beneficial to have a named property for those values. Instead of sh:values, this could be achieved via:

edg:Database-tableCount
    a sh:PropertyShape ;
    sh:inferredProperty edg:tableCount ;   # NEW
    sh:path [
        sh:count [
            sh:inversePath edg:tableOf ;
        ] ;
    ] .
    sh:datatype xsd:integer ;
    sh:description "The number of tables in this database, automatically computed." ;

This means that when inference is executed, the focus node will receive values for the predicate edg:tableCount. This value can then be accessed by other node expressions using [ sh:path edg:tableCount ] (or a new property instead of sh:path) implementing a simple form of backward chaining.

Note that these inferred values do not need to be materialized as triples, and instead be computed on-demand. This means they are always up to date (truth maintenance), but are by default invisible to other graph operations such as SPARQL queries. We need to set the correct expectations on that in the user base.

Generalize sh:targetNode to allow any Node Expression

This will allow the computation of flexible targets.

Note: The current Node Expressions design assumes that a focusNode exists. For target computation, there is no such thing, so it would need to be explicitly passed in (using sh:nodes) or we add another property that defines the target class to begin with.

Generalize constraint parameters

For most constraint parameters, allow any Node Expression that evaluates to a given datatype or IRI. For example, sh:minCount currently takes an xsd:integer, and it would need to be generalized to allow any Node Expression that produces an xsd:integer.

Optionally, Node Expressions could be allowed in other places such as sh:deactivated (we do this already in TopBraid). Once SHACL engines have implemented support for Node Expressions these become low-hanging fruits to implement across the board.

Summary of Changes (Affected Properties)

  • sh:path
  • (add a new property) sh:inferredProperty/sh:derivedProperty/sh:aliasProperty
  • sh:targetNode
  • all constraint parameters such as sh:minCount and sh:in could accept node expressions

Discussion

If we elect to implement one or more of these changes, SHACL 1.2 will have a very significant gain in expressiveness. Also SHACL would be future-proofed because the library of Node Expressions can be extended through separate specs.

A major downside is that the additional expressiveness makes SHACL less predictable to static analysis. For example, form generators can no longer easily determine whether a field has sh:maxCount 1 or not, if that value is a node expression. These algorithms would need to be changed to assume that no constraint exists, or pre-compute the value for the given focus node.

Another downside is the learning curve, as the added expressiveness may look alien to beginners.

Like any gain in expressiveness, there will be corner cases where engines may "crash" due to infinite recursion etc. We did leave such cases undefined in the past and it hasn't really impacted the acceptance of SHACL except for theoretical pureness. Furthermore, many other languages including all programming languages also support recursion and allow infinite loops etc. In my (Holger's) opinion, SHACL should be a language that gets work done, without being artificially limited by theoretical corner cases.

Defining a SHACL Profile without any node expressions, can help address these problems.

(Page started by @HolgerKnublauch, following discussions on https://github.com/w3c/data-shapes/issues/215)