RDF to ArangoDB (RPT)
Virtualizng ArangoDB as a Triple Store
What is RPT?
The RDF-topology Preserving Transformation (RPT) algorithm preserves the RDF graph structure by transforming each RDF statement into an edge in the Property Graph (PG).
Consider the following RDF Graph:
@prefix ex: <http://example.org/> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
ex:book ex:publish_date "1963-03-22"^^xsd:date .
ex:book ex:pages "100"^^xsd:integer .
ex:book ex:cover 20 .
ex:book ex:index 55 .
RPT converts the triple (ex:book, ex:index, 55)
into two
nodes (ex:book)
and (55)
, connected by an edge (ex:index)
. All other triples
involving RDF resources, blank nodes, or literal values can be transformed in
a similar way so that we obtain the Property Graph below:
The Algorithm below formalizes the RPT approach. For each triple, create a node for the subject (line 3) and the object (line 5), with an edge connecting them (line 12) - of course avoiding duplicate nodes for the same IRIs.
Now, consider the following RDF-star Graph:
@prefix ex: <http://example.com/> .
<< ex:Mary ex:likes ex:Matt >> ex:certainty 0.5 .
<< ex:Mary ex:age 28 >> ex:certainty 1 .
ArangoRDF’s RPT transformation for RDF-star Graphs is slightly different from the
transformation proposed in the paper. In order to preserve the concept of virtualizing ArangoDB
as a Triple Store, conversion for RDF-star statements is identical to RDF triples. This is not
the case for the RPT transformation proposed in the paper, which proposes to add the ex:certainty 1 as
an edge attribute to the edge connecting the nodes ex:Mary and 28. Instead, ArangoRDF’s RPT
transformation expresses (..., ex:certainty, 1)
as its own edge:
Please Note: The rdflib
python package hasn’t yet introduced support for
Quoted Triples, so
ArangoRDF’s support for RDF-star is based on Triple Reification.
As a result, the RDF-star Graph above can be processed with ArangoRDF as follows:
from rdflib import Graph
from arango import ArangoClient
from arango_rdf import ArangoRDF
data = """
@prefix ex: <http://example.com/> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
# << ex:Mary ex:likes ex:Matt >> ex:certainty 0.5 .
# << ex:Mary ex:age 28 >> ex:certainty 1 .
[] a rdf:Statement;
rdf:subject ex:Mary;
rdf:predicate ex:likes;
rdf:object ex:Matt ;
ex:certainty 0.5 .
[] a rdf:Statement;
rdf:subject ex:Mary;
rdf:predicate ex:age;
rdf:object 28 ;
ex:certainty 1 .
"""
rdf_graph = Graph()
rdf_graph.parse(data=data, format="turtle")
db = ArangoClient().db()
adbrdf = ArangoRDF(db)
adbrdf.rdf_to_arangodb_by_rpt(name="DataRPT", rdf_graph=rdf_graph, overwrite_graph=True)
ArangoDB Collection Mapping Process
The ArangoDB Collection Mapping Process is defined as the algorithm used to map RDF Resources to ArangoDB Collections. In RPT, the ArangoDB Collections generated are consistent:
{Name}_URIRef
: The Vertex collection forrdflib.term.URIRef
resources.{Name}_BNode
: The Vertex collection forrdflib.term.BNode
resources.{Name}_Literal
: The Vertex collection forrdflib.term.Literal
resources.{Name}_Statement
: The Edge collection for all triples/quads.
Using the python example from above, the RDF Resources of your RDF Graph would be stored under the following ArangoDB Collections:
- DataRPT_URIRef
ex:Mary
ex:Matt
- DataRPT_BNode
[]
(1)[]
(2)
- DataRPT_Literal
0.5
1
- DataRPT_Statement
ex:Mary -> ex:likes -> ex:Matt
ex:Mary -> ex:age -> 28
(ex:Mary -> ex:likes -> ex:Matt) -> ex:certainty -> 0.5
(ex:Mary -> ex:age -> 28) -> ex:certainty -> 1
This is the consistent naming scheme for all ArangoRDF RPT transformations. The name of the RDF Graph is used as a prefix for the 3 Vertex Collections and the 1 Edge Collection.
Supported Cases
Note: RDF-to-ArangoDB functionality has been implemented using concepts described in the paper Transforming RDF-star to Property Graphs: A Preliminary Analysis of Transformation Approaches.
The paper presents a systematic list of test cases that transformation approaches need to fulfill. These test cases range from simple RDF Graphs to complex RDF-star Graphs.
ArangoRDF’s RPT interface can be observed here.
View how ArangoRDF’s RPT transformation approach performs on these test cases in Colab.