Proposed Dataset
API changes
#2591
Replies: 7 comments 23 replies
-
Beta Was this translation helpful? Give feedback.
-
It also looks like a simple |
Beta Was this translation helpful? Give feedback.
-
I support all of the proposals in this discussion. This has been a long-time coming - we've noticed these things for years - but have never done anything about these and they still hurt u - @edmondchuc is battling with datasets in a current project. I suggest we also remove the |
Beta Was this translation helpful? Give feedback.
-
I don't think |
Beta Was this translation helpful? Give feedback.
-
I've wrote down my thoughts on what expected interfaces are in a pseudo python/rdflib format:
Hopefully it's a coherent perspective; it may take some effort to reconcile / integrate with others'. Will have a go at this next. Minimal class definitionsOnly enough to illustrate the thinking/scenarios class GraphType(Enum):
DEFAULT = "default"
NAMED = "named"
class Graph:
def __init__(
self,
identifier: URIRef | None = None,
graph_type: GraphType | None = None,
):
pass Dataset: class Dataset:
def __init__(self):
pass
def quads(
self,
context: GraphType | URIRef | list[GraphType | URIRef] | None = None,
):
pass
def triples(
self,
context: GraphType | URIRef | list[GraphType | URIRef] | None = None,
):
pass
def add_graph(
self,
graph: Graph,
target: URIRef | GraphType.DEFAULT | None = None,
):
pass Graph ScenariosScenario 1: Default Graph (Start with Triple)Graph instantiated without context becomes a "default" or contextless graph when the first thing added is a triple. g = Graph()
g.parse(data="<ex:s1> <ex:p1> <ex:o1> .", format="turtle")
print(list(g.triples()))
> [('<ex:s1>', '<ex:p1>', '<ex:o1>')]
print(list(g.quads()))
> [('<ex:s1>', '<ex:p1>', '<ex:o1>', None)]
print(g.graph_type)
> default # graph type is now "default"; any triples or quads added after this have no context
g.parse(data="<ex:s2> <ex:p2> <ex:o2> <ex:graph> .", format="nquads")
print(list(g.triples()))
> [('<ex:s1>', '<ex:p1>', '<ex:o1>'), ('<ex:s2>', '<ex:p2>', '<ex:o2>')]
print(list(g.quads()))
> [('<ex:s1>', '<ex:p1>', '<ex:o1>', None), ('<ex:s2>', '<ex:p2>', '<ex:o2>', None)] Scenario 2: Named Graph (Start with Quad)Graph instantiated without context gets context from parsed quad. g = Graph()
g.parse(data="<ex:s2> <ex:p2> <ex:o2> <ex:g2> .", format="nquads")
print(list(g.triples()))
> [('<ex:s2>', '<ex:p2>', '<ex:o2>')]
print(list(g.quads()))
> [('<ex:s2>', '<ex:p2>', '<ex:o2>', '<ex:g2>')]
print(g.graph_type)
> named
g.parse(data="<ex:s3> <ex:p3> <ex:o3> .", format="turtle")
print(list(g.triples()))
> [('<ex:s2>', '<ex:p2>', '<ex:o2>'), ('<ex:s3>', '<ex:p3>', '<ex:o3>')]
print(list(g.quads()))
> [('<ex:s2>', '<ex:p2>', '<ex:o2>', '<ex:g2>'), ('<ex:s3>', '<ex:p3>', '<ex:o3>', '<ex:g2>')] Scenario 3: Named Graph with IdentifierTriples added to graph inherit the context. g = Graph(identifier="ex:g1")
print(g.graph_type)
> named
g.parse(data="<ex:s1> <ex:p1> <ex:o1> .", format="turtle")
print(list(g.triples()))
> [('<ex:s1>', '<ex:p1>', '<ex:o1>')]
print(list(g.quads()))
> [('<ex:s1>', '<ex:p1>', '<ex:o1>', '<ex:g1>')] Scenario 4: Add quad to default graphContext is ignored. g = Graph(graph_type="default")
g.parse(data="<ex:s1> <ex:p1> <ex:o1> <ex:graph> .", format="nquads")
print(list(g.triples()))
> [('<ex:s1>', '<ex:p1>', '<ex:o1>')]
print(list(g.quads()))
> [('<ex:s1>', '<ex:p1>', '<ex:o1>', None)] Dataset ScenariosScenario 5: Add a Default Graph to a Datasetg = Graph(graph_type="default")
g.parse(data="<ex:s1> <ex:p1> <ex:o1> .", format="turtle")
ds = Dataset()
ds.add_graph(g)
print(list(ds.triples()))
> [('<ex:s1>', '<ex:p1>', '<ex:o1>')]
print(list(ds.quads()))
> [('<ex:s1>', '<ex:p1>', '<ex:o1>', None)] Scenario 6: Add a Named Graph to a Datasetg = Graph(identifier="ex:g1")
g.parse(data="<ex:s2> <ex:p2> <ex:o2> .", format="turtle")
ds = Dataset()
ds.add_graph(g)
print(list(ds.triples()))
> [('<ex:s2>', '<ex:p2>', '<ex:o2>')]
print(list(ds.quads()))
> [('<ex:s2>', '<ex:p2>', '<ex:o2>', '<ex:g1>')] Scenario 7: Add a Graph to the Default ContextGraph ID of graph being added (if present) is overridden by "target". g = Graph(identifier="ex:g1")
g.parse(data="<ex:s3> <ex:p3> <ex:o3> .", format="turtle")
ds = Dataset()
ds.add_graph(g, target="default")
print(list(ds.triples()))
> [('<ex:s3>', '<ex:p3>', '<ex:o3>')]
print(list(ds.quads()))
> [('<ex:s3>', '<ex:p3>', '<ex:o3>', None)] Scenario 8: Add Graphs to Dataset changing the graphGraph ID of graph being added (if present) is overridden by "target". g = Graph(identifier="ex:g2", graph_type="named")
g.parse(data="<ex:s4> <ex:p4> <ex:o4> .", format="turtle")
ds = Dataset()
ds.add_graph(g, target="ex:newg")
print(list(ds.triples()))
> [('<ex:s4>', '<ex:p4>', '<ex:o4>')]
print(list(ds.quads()))
> [('<ex:s4>', '<ex:p4>', '<ex:o4>', '<ex:newg>')] Scenario 9: Iterate Over Triples with Contextsg1 = Graph(graph_type="default")
g1.parse(data="<ex:s1> <ex:p1> <ex:o1> .", format="turtle")
g2 = Graph(identifier="ex:g1")
g2.parse(data="<ex:s2> <ex:p2> <ex:o2> .", format="turtle")
ds = Dataset()
ds.add_graph(g1)
ds.add_graph(g2)
print(list(ds.triples()))
> [('<ex:s1>', '<ex:p1>', '<ex:o1>'), ('<ex:s2>', '<ex:p2>', '<ex:o2>')]
print(list(ds.triples(context=["NAMED", "DEFAULT"]))) # equivalent to default behaviour when not specifying context
> [('<ex:s1>', '<ex:p1>', '<ex:o1>'), ('<ex:s2>', '<ex:p2>', '<ex:o2>')]
print(list(ds.triples(context="NAMED")))
> [('<ex:s2>', '<ex:p2>', '<ex:o2>')]
print(list(ds.triples(context="DEFAULT")))
> [('<ex:s1>', '<ex:p1>', '<ex:o1>')]
print(list(ds.triples(context=["DEFAULT", "ex:g2"]))) # ex:g2 is not in the dataset so no data returned from this graph.
> [('<ex:s1>', '<ex:p1>', '<ex:o1>')] Scenario 10: Iterate Over Quads with Contextsg1 = Graph(graph_type="default")
g1.parse(data="<ex:s1> <ex:p1> <ex:o1> .", format="turtle")
g2 = Graph(identifier="ex:g1")
g2.parse(data="<ex:s2> <ex:p2> <ex:o2> .", format="turtle")
ds = Dataset()
ds.add_graph(g1)
ds.add_graph(g2)
print(list(ds.quads()))
> [('<ex:s1>', '<ex:p1>', '<ex:o1>', None), ('<ex:s2>', '<ex:p2>', '<ex:o2>', '<ex:g1>')]
print(list(ds.quads(context="NAMED")))
> [('<ex:s2>', '<ex:p2>', '<ex:o2>', '<ex:g1>')]
print(list(ds.quads(context="DEFAULT")))
> [('<ex:s1>', '<ex:p1>', '<ex:o1>', None)]
print(list(ds.quads(context=["DEFAULT", "ex:g2"]))) # ex:g2 is not in the dataset so no data returned from this graph.
> [('<ex:s1>', '<ex:p1>', '<ex:o1>', None)] |
Beta Was this translation helpful? Give feedback.
-
Observations from afar ...
Would the dataset (the storage unit) have a default context setting? Otherwise if an app changes, then it might require every API call to be tracked down and changed. FWIW Fuseki has both modes - union default graph is SPARQL only, and it is a view of the dataset at query time. The usual way is to have a setting on the dataset but it can be set per query execution. For update, where do new triples go to in an inclusive dataset? |
Beta Was this translation helpful? Give feedback.
-
Trying to summarize, what we have to do:
What else? I had a look and it looks like the refactoring will be complex, in order not to break things we need to understand how most of the project works. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
The
Dataset
is quite weird and assumes that standaloneGraph
s have identifiers, which will be phased out (#2537). For example, adding a named graph to aDataset
looks like this:Moreover,
Dataset
uses the termcontext
when referring to named graphs. I think it should be phased out as well.If in doubt, I suggest just copying Jena's
Dataset
API.My suggestions for
Dataset
:add_named_graph(uri: IdentifiedNode, graph: Graph)
methodhas_named_graph(uri: IdentifiedNode)
methodremove_named_graph(uri: IdentifiedNode)
methodreplace_named_graph(uri: IdentifiedNode, graph: Graph))
methodgraphs()
method as an alias forcontexts()
default_graph
property as an alias fordefault_context
get_named_graph
as an alias forget_graph
graph(graph)
methodremove_graph(graph)
methodcontexts()
methodUsing
IdentifiedNode
as a super-interface forURIRef
andBNode
(since both are allowed as graph names in RDF 1.1).The above example would become something like this after these changes:
Beta Was this translation helpful? Give feedback.
All reactions