Add the Jelly output format #258
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR adds support for outputting files in the Jelly format, a high-performance binary RDF format based on Protocol Buffers.
Implementation
I used the
jelly-rdf4j
library, which integrates nicely with the RDF4J Rio subsystem. GitHub: https://github.com/Jelly-RDF/jelly-jvmJelly is a binary format, so it can't be written to a Java
Writer
. For this reason I added a newwrite
method toQuadStore
that takes as input anOutputStream
. In modern RDF4J there is really no reason to use aWriter
for output, as the only legal encoding is UTF-8 anyway, and RDF4J is perfectly happy with writing to a raw binary stream (it should even be faster). But, I assume that theWriter
-based method must be kept for API compatibility, so for now I made it so that only Jelly uses the nativeOutputStream
output. Remaining RDF4J formats can be migrated later by simply replacing theout
parameter to beOutputStream
– I already tested that and it works fine.These changes may be useful later to reduce the hackiness of the current HDT serializer implementation (which currently bypasses
write
with an additional conditional branch), or to add support for other binary formats.Tests
I added a test to verify that the output is saved correctly in the Jelly format.
The
target_output.jelly
was generated usingjelly-cli
with this command:$ jelly-cli rdf to-jelly --opt.rdf-star=false --opt.generalized-statements=false ../output-turtle/target_output.ttl > target_output.jelly
Using the output
You can use
jelly-cli
to play around with the generated Jelly files and convert them to other formats. You can also load them into Apache Jena Fuseki by installing the Jelly plugin for Jena.You can also use pyjelly to load the file into rdflib in Python.
Dependencies
This adds 3 new dependencies, which together weigh around 2 megabytes in JAR form:
jelly-core
– generic serialization code for Jellyjelly-rdf4j
– integration ofjelly-core
with RDF4Jprotobuf-java
– Google's Protobuf libraryprotobuf-java
is a very popular library, used in many projects, with a robust security policy. The Jelly libraries are extensively tested (8000+ test cases in the main suite) and have mitigations for known security risks tested in CI. They are production-grade and are currently being used for example in the nanopublication services for inter-service communication.