iteration over statements in a file without loading the graph into memory #1565

rjoberon · 2017-06-19T08:08:19Z

rjoberon
Jun 19, 2017

It would be great if rdflib could support processing large files which do not fit into main memory. For instance, providing an iterator over the statements is often sufficient, if the statements are "atomic", that is, provide all information about an entity. This can be the case for datasets generated from relational databases.

As an example: I would like to extract data from a large Turtle dataset and convert it into JSON. For that purpose (and given the structure of the dataset) it would be sufficient to have an iterator over the statements in the file.

I had a look at the source code for parsing N3 but could not find an apparent method for that use case. I suppose that's related to what W3C's RDF stream processing community group is aiming for but on a much simpler scale.

Tips on how to accomplish this (with or without rdflib) would also be great.

gromgull · 2017-06-19T08:15:22Z

gromgull
Jun 19, 2017
Maintainer

I started looking at this ages ago: #411 I have revisited it in 2016 - but apparently didn't push, I'll see if I can find my changes. It's still far away from working though! - Gunnar

…

On 19 June 2017 at 10:08, Robert Jäschke ***@***.***> wrote: It would be great if rdflib could support processing large files which do not fit into main memory. For instance, providing an iterator over the statements is often sufficient, if the statements are "atomic", that is, provide all information about an entity. This can be the case for datasets generated from relational databases. As an example: I would like to extract data from a large Turtle dataset <http://datendienst.dnb.de/cgi-bin/mabit.pl?userID=opendata&pass=opendata&cmd=login> and convert it into JSON. For that purpose (and given the structure of the dataset) it would be sufficient to have an iterator over the statements in the file. I had a look at the source code for parsing N3 <https://github.com/RDFLib/rdflib/blob/master/rdflib/plugins/parsers/notation3.py> but could not find an apparent method for that use case. I suppose that's related to what W3C's RDF stream processing community group <https://www.w3.org/community/rsp/> is aiming for but on a much simpler scale. Tips on how to accomplish this (with or without rdflib) would also be great. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#751>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAK3bMB6f5xxFvDCYy8APH-LKeiuDe7Dks5sFix0gaJpZM4N94oC> .

-- http://gromgull.net

0 replies

rjoberon · 2017-06-20T13:00:41Z

rjoberon
Jun 20, 2017
Author

Thank you Gunnar, that would be great. At the moment I am trying to use xml.sax, since an RDF/XML dump is available as well. That's not optimal but would have similar limitations as the approach suggested above.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

iteration over statements in a file without loading the graph into memory #1565

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

iteration over statements in a file without loading the graph into memory #1565

Uh oh!

rjoberon Jun 19, 2017

Replies: 2 comments

Uh oh!

gromgull Jun 19, 2017 Maintainer

Uh oh!

rjoberon Jun 20, 2017 Author

rjoberon
Jun 19, 2017

gromgull
Jun 19, 2017
Maintainer

rjoberon
Jun 20, 2017
Author