API Specification
This page contains the specification for all classes and methods available in ArangoRDF.
ArangoRDF
- class arango_rdf.main.ArangoRDF(db: ~arango.database.StandardDatabase, controller: ~arango_rdf.controller.ArangoRDFController = <arango_rdf.controller.ArangoRDFController object>, logging_lvl: str | int = 20)[source]
ArangoRDF: Transform RDF Graphs into ArangoDB Graphs & vice-versa.
Implemented using concepts referred in https://arxiv.org/abs/2210.05781.
- Parameters:
db (arango.database.Database) – A python-arango database instance
logging_lvl (str | int) – Defaults to logging.INFO. Other useful options are logging.DEBUG (more verbose), and logging.WARNING (less verbose).
- Raises:
TypeError – On invalid parameter types
- arangodb_to_rdf(name: str, rdf_graph: Graph, metagraph: Dict[str, Dict[str, Set[str]]], explicit_metagraph: bool = True, list_conversion_mode: str = 'static', dict_conversion_mode: str = 'static', infer_type_from_adb_v_col: bool = False, include_adb_v_col_statements: bool = False, include_adb_v_key_statements: bool = False, include_adb_e_key_statements: bool = False, **adb_export_kwargs: Any) Graph [source]
Create an RDF Graph from an ArangoDB Graph via its Metagraph.
- Parameters:
name (str) – The name of the ArangoDB Graph
rdf_graph (rdflib.graph.Graph) – The target RDF Graph to insert into.
metagraph (arango_rdf.typings.ADBMetagraph) – An dictionary of dictionaries defining the ArangoDB Vertex & Edge Collections whose entries will be inserted into the RDF Graph.
explicit_metagraph (bool) – Only keep the document attributes specified in metagraph when importing to RDF (is True by default). Otherwise, all document attributes are included. Defaults to True.
list_conversion_mode (str) – Specify how ArangoDB JSON lists within and ArangoDB Document are processed into the RDF Graph. If “serialize”, JSON Objects will be serialized into RDF Literals. If “collection”, ArangoDB lists will be processed using the RDF Collection structure. If “container”, ArangoDB lists will be processed using the RDF Container structure. If “static”, elements within lists will be processed as individual statements. Defaults to “static”. NOTE: “serialize” is recommended if round-tripping is desired, but only if round-tripping via PGT.
dict_conversion_mode (str) – Specify how ArangoDB JSON Objects within an ArangoDB Document are processed into the RDF Graph. If “serialize”, JSON Objects will be serialized into RDF Literals. If “static”, elements within dictionaries will be processed as individual statements with the help of BNodes. Defaults to “static”. NOTE: “serialize” is recommended if round-tripping is desired, but only if round-tripping via PGT.
infer_type_from_adb_v_col (bool) – Specify whether rdf:type statements of the form resource rdf:type adb_v_col . should be inferred upon transferring ArangoDB Vertices into RDF.
include_adb_v_col_statements (bool) – Specify whether adb:collection statements of the form adb_vertex adb:collection adb_v_col . should be generated upon transferring ArangoDB Documents into RDF. This can be used to maintain document collections when a user is interested in round-tripping.
include_adb_v_key_statements (bool) – Specify whether adb:key statements of the form adb_vertex adb:key adb_vertex[“key”] . should be generated upon transferring ArangoDB Documennts into RDF. This can be used to maintain document keys when a user is interested in round-tripping.
include_adb_e_key_statements (bool) – Specify whether adb:key statements of the form adb_edge adb:key adb_edge[“key”] . should be generated upon transferring ArangoDB Edges into RDF. This can be used to maintain edge keys when a user is interested in round-tripping. NOTE: Enabling this option will impose Triple Reification on all ArangoDB Edges.
adb_export_kwargs (Any) – Keyword arguments to specify AQL query options when fetching documents from the ArangoDB instance. Full parameter list: https://docs.python-arango.com/en/main/specs.html#arango.aql.AQL.execute
- Returns:
The RDF representation of the ArangoDB Graph.
- Return type:
rdflib.graph.Graph
- arangodb_collections_to_rdf(name: str, rdf_graph: Graph, v_cols: Set[str], e_cols: Set[str], list_conversion_mode: str = 'static', dict_conversion_mode: str = 'static', infer_type_from_adb_v_col: bool = False, include_adb_v_col_statements: bool = False, include_adb_v_key_statements: bool = False, include_adb_e_key_statements: bool = False, **adb_export_kwargs: Any) Graph [source]
Create an RDF Graph from an ArangoDB Graph via its Collection Names.
- Parameters:
name (str) – The name of the ArangoDB Graph
rdf_graph (rdflib.graph.Graph) – The target RDF Graph to insert into.
v_cols (Set[str]) – The set of ArangoDB Vertex Collections to import to RDF.
e_cols (Set[str]) – The set of ArangoDB Edge Collections to import to RDF.
list_conversion_mode (str) – Specify how ArangoDB JSON lists within and ArangoDB Document are processed into the RDF Graph. If “serialize”, JSON Objects will be serialized into RDF Literals. If “collection”, ArangoDB lists will be processed using the RDF Collection structure. If “container”, ArangoDB lists will be processed using the RDF Container structure. If “static”, elements within lists will be processed as individual statements. Defaults to “static”. NOTE: “serialize” is recommended if round-tripping is desired, but only if round-tripping via PGT.
dict_conversion_mode (str) – Specify how ArangoDB JSON Objects within an ArangoDB Document are processed into the RDF Graph. If “serialize”, JSON Objects will be serialized into RDF Literals. If “static”, elements within dictionaries will be processed as individual statements with the help of BNodes. Defaults to “static”. NOTE: “serialize” is recommended if round-tripping is desired, but only if round-tripping via PGT.
infer_type_from_adb_v_col (bool) – Specify whether rdf:type statements of the form resource rdf:type adb_v_col . should be inferred upon transferring ArangoDB Vertices into RDF.
include_adb_v_col_statements (bool) – Specify whether adb:collection statements of the form adb_vertex adb:collection adb_v_col . should be generated upon transferring ArangoDB Documents into RDF. This can be used to maintain document collections when a user is interested in round-tripping.
include_adb_v_key_statements (bool) – Specify whether adb:key statements of the form adb_vertex adb:key adb_vertex[“key”] . should be generated upon transferring ArangoDB Documennts into RDF. This can be used to maintain document keys when a user is interested in round-tripping.
include_adb_e_key_statements (bool) – Specify whether adb:key statements of the form adb_edge adb:key adb_edge[“key”] . should be generated upon transferring ArangoDB Edges into RDF. This can be used to maintain edge keys when a user is interested in round-tripping. NOTE: Enabling this option will impose Triple Reification on all ArangoDB Edges.
adb_export_kwargs (Any) – Keyword arguments to specify AQL query options when fetching documents from the ArangoDB instance. Full parameter list: https://docs.python-arango.com/en/main/specs.html#arango.aql.AQL.execute
- Returns:
The RDF representation of the ArangoDB Graph.
- Return type:
rdflib.graph.Graph
- arangodb_graph_to_rdf(name: str, rdf_graph: Graph, list_conversion_mode: str = 'static', dict_conversion_mode: str = 'static', infer_type_from_adb_v_col: bool = False, include_adb_v_col_statements: bool = False, include_adb_v_key_statements: bool = False, include_adb_e_key_statements: bool = False, **adb_export_kwargs: Any) Graph [source]
Create an RDF Graph from an ArangoDB Graph via its Graph Name.
- Parameters:
name (str) – The name of the ArangoDB Graph
rdf_graph (rdflib.graph.Graph) – The target RDF Graph to insert into.
list_conversion_mode (str) – Specify how ArangoDB JSON lists within and ArangoDB Document are processed into the RDF Graph. If “serialize”, JSON Objects will be serialized into RDF Literals. If “collection”, ArangoDB lists will be processed using the RDF Collection structure. If “container”, ArangoDB lists will be processed using the RDF Container structure. If “static”, elements within lists will be processed as individual statements. Defaults to “static”. NOTE: “serialize” is recommended if round-tripping is desired, but only if round-tripping via PGT.
dict_conversion_mode (str) – Specify how ArangoDB JSON Objects within an ArangoDB Document are processed into the RDF Graph. If “serialize”, JSON Objects will be serialized into RDF Literals. If “static”, elements within dictionaries will be processed as individual statements with the help of BNodes. Defaults to “static”. NOTE: “serialize” is recommended if round-tripping is desired, but only if round-tripping via PGT.
infer_type_from_adb_v_col (bool) – Specify whether rdf:type statements of the form resource rdf:type adb_v_col . should be inferred upon transferring ArangoDB Vertices into RDF.
include_adb_v_col_statements (bool) – Specify whether adb:collection statements of the form adb_vertex adb:collection adb_v_col . should be generated upon transferring ArangoDB Documents into RDF. This can be used to maintain document collections when a user is interested in round-tripping.
include_adb_v_key_statements (bool) – Specify whether adb:key statements of the form adb_vertex adb:key adb_vertex[“key”] . should be generated upon transferring ArangoDB Documennts into RDF. This can be used to maintain document keys when a user is interested in round-tripping.
include_adb_e_key_statements (bool) – Specify whether adb:key statements of the form adb_edge adb:key adb_edge[“key”] . should be generated upon transferring ArangoDB Edges into RDF. This can be used to maintain edge keys when a user is interested in round-tripping. NOTE: Enabling this option will impose Triple Reification on all ArangoDB Edges.
adb_export_kwargs (Any) – Keyword arguments to specify AQL query options when fetching documents from the ArangoDB instance. Full parameter list: https://docs.python-arango.com/en/main/specs.html#arango.aql.AQL.execute
- Returns:
The RDF representation of the ArangoDB Graph.
- Return type:
rdflib.graph.Graph
- rdf_to_arangodb_by_rpt(name: str, rdf_graph: Graph, contextualize_graph: bool = False, flatten_reified_triples: bool = True, use_hashed_literals_as_keys: bool = True, overwrite_graph: bool = False, batch_size: int | None = None, **adb_import_kwargs: Any) Graph [source]
Create an ArangoDB Graph from an RDF Graph using the RDF-topology Preserving Transformation (RPT) Algorithm.
RPT preserves the RDF Graph structure by transforming each RDF statement into a Property Graph Edge. More info on RPT can be found in the package’s README file, or in the following paper: https://arxiv.org/pdf/2210.05781.pdf.
This method will store the RDF Resources of rdf_graph under the following ArangoDB Collections:
{Name}_URIRef
: Vertex collection forrdflib.term.URIRef
resources.{Name}_BNode
: Vertex collection forrdflib.term.BNode
resources.{Name}_Literal
: Vertex collection forrdflib.term.Literal
resources.{Name}_Statement
: Edge collection for all triples/quads.
- Parameters:
name (str) – The name of the RDF Graph
rdf_graph – The RDF Graph object. NOTE: This object is modified in-place in order for PGT to work. Do not expect the original state of rdf_graph to be preserved.
contextualize_graph (bool) –
A work-in-progress flag that seeks to enhance the Terminology Box of rdf_graph by providing the following features:
Loading Meta Ontologies (i.e OWL, RDF, RDFS, etc.) into the RDF Graph
Providing Domain & Range Inference
Providing Domain & Range Introspection
flatten_reified_triples (bool) – If set to False, will preserve the RDF structure of reified triples. If set to True, will convert any reified triple into a “regular” Property Graph Edge. Defaults to True.
use_hashed_literals_as_keys (bool) – If set to False, will not use the hashed value of an RDF Literal as its ArangoDB Document Key (i.e a randomly-generated key will instead be used). If set to True, all RDF Literals with the same value will be represented as one single ArangoDB Document. Defaults to True.
overwrite_graph (bool Defaults to False.) – Overwrites the ArangoDB graph identified by name if it already exists, and drops its associated collections. Defaults to False.
batch_size (int | None) – If specified, runs the ArangoDB Data Ingestion process for every batch_size RDF triples/quads within rdf_graph. Defaults to len(rdf_graph).
adb_import_kwargs – Keyword arguments to specify additional parameters for ArangoDB document insertion. Full parameter list: https://docs.python-arango.com/en/main/specs.html#arango.collection.Collection.import_bulk
adb_import_kwargs – Any
- Type:
rdf_graph: rdflib.graph.Graph
- Returns:
The ArangoDB Graph API wrapper.
- Return type:
arango.graph.Graph
- rdf_to_arangodb_by_pgt(name: str, rdf_graph: Graph, adb_col_statements: Graph | None = None, write_adb_col_statements: bool = True, contextualize_graph: bool = False, flatten_reified_triples: bool = True, overwrite_graph: bool = False, batch_size: int | None = None, **adb_import_kwargs: Any) Graph [source]
Create an ArangoDB Graph from an RDF Graph using the Property Graph Transformation (PGT) Algorithm.
PGT ensures that datatype property statements (i.e statements whose objects are Literals) are mapped to document properties in the Property Graph. Learn more about PGT here.
Contrary to RPT, this method will rely on the nature of the RDF Resource/Statement to determine which ArangoDB Collection it belongs to. This process is referred to as the ArangoDB Collection Mapping Process. Learn more about the PGT ArangoDB Collection Mapping Process here.
Contrary to RPT, regardless of whether contextualize_graph is set to True or not, all RDF Predicates within every RDF Statement in rdf_graph will be processed as their own ArangoDB Document, and will be stored under the “Property” Vertex Collection.
- Parameters:
name (str) – The name of the RDF Graph
rdf_graph – The RDF Graph object. NOTE: This object is modified in-place in order for PGT to work. Do not expect the original state of rdf_graph to be preserved.
adb_col_statements (rdflib.graph.Graph | None) – An optional RDF Graph containing ArangoDB Collection statements of the form adb_vertex http://arangodb/collection “adb_v_col” .. Useful for creating a custom ArangoDB Collection mapping of RDF Resources within rdf_graph. Defaults to None. NOTE: Cannot be used in conjunction with collection statements in rdf_graph.
write_adb_col_statements (bool) – Run the ArangoDB Collection Mapping Process for rdf_graph to write the ArangoDB Collection statements of the form adb_vertex http://arangodb/collection “adb_v_col” . ` into **adb_col_statements**. This parameter is ignored if **contextualize_graph** is set to True, as the ArangoDB Collection Mapping Process is required for Graph Contextualization. See :func:`write_adb_col_statements for more information.
contextualize_graph (bool) –
A work-in-progress flag that seeks to enhance the Terminology Box of rdf_graph by providing the following features:
Loading Meta Ontologies (i.e OWL, RDF, RDFS, etc.) into the RDF Graph
Providing Domain & Range Inference
Providing Domain & Range Introspection
flatten_reified_triples (bool) –
If set to False, will preserve the RDF structure of any Reified Triple. If set to True, will “flatten” any reified triples into a regular Property Graph Edge. Defaults to True.
overwrite_graph (bool) – Overwrites the ArangoDB graph identified by name if it already exists, and drops its associated collections. Defaults to False.
batch_size (int | None) – If specified, runs the ArangoDB Data Ingestion process for every batch_size RDF triples/quads within rdf_graph. Defaults to None.
adb_import_kwargs – Keyword arguments to specify additional parameters for the ArangoDB Data Ingestion process. The full parameter list is here. # noqa: E501
- Type:
rdf_graph: rdflib.graph.Graph
- Returns:
The ArangoDB Graph API wrapper.
- Return type:
arango.graph.Graph
- write_adb_col_statements(rdf_graph: Graph, adb_col_statements: Graph | None = None) Graph [source]
RDF -> ArangoDB (PGT): Run the ArangoDB Collection Mapping Process for rdf_graph to map RDF Resources to their respective ArangoDB Collection.
The PGT Algorithm relies on the ArangoDB Collection Mapping Process to identify the ArangoDB Collection of every RDF Resource. Using this method prior to running
rdf_to_arangodb_by_pgt()
allows you to visualize and modify the mapping. Learn more about the PGT ArangoDB Collection Mapping Process here.NOTE: Running this method prior to
rdf_to_arangodb_by_pgt()
is unnecessary if the user is not interested in viewing/modifying the ArangoDB Mapping.NOTE: There can only be 1 adb:collection statement associated to each RDF Resource.
- Parameters:
rdf_graph (rdflib.graph.Graph) – The RDF Graph object.
adb_col_statements (Optional[rdflib.graph.Graph]) – An existing RDF Graph containing adb:collection statements. If not provided, a new RDF Graph will be created. Defaults to None. NOTE: The ArangoDB Collection Mapping Process relies heavily on mapping certain RDF Resources to the “Class” and “Property” ArangoDB Collections. Therefore, it is currently not possible to overwrite any RDF Resources that belong to these collections.
- rdf_id_to_adb_key(rdf_id: str, rdf_term: URIRef | BNode | Literal | None = None) str [source]
RDF -> ArangoDB: Convert an RDF Resource ID string into an ArangoDB Key via some hashing function.
If rdf_term is provided, then the value of the statement rdf_term adb:key “<ArangoDB Document Key>” . will be used as the ArangoDB Key (assuming that said statement exists).
Current hashing function used: FarmHash
- Parameters:
rdf_id (str) – The string representation of an RDF Resource
rdf_term (Optional[URIRef | BNode | Literal]) – The optional RDF Term to check if it has an adb:key statement associated to it.
- Returns:
The ArangoDB _key equivalent of rdf_id
- Return type:
str
- hash(rdf_id: str) str [source]
RDF -> ArangoDB: Hash an RDF Resource ID string into an ArangoDB Key via some hashing function.
Current hashing function used: FarmHash
List of hashing functions tested & benchmarked: - Built-in hash() function - Hashlib MD5 - xxHash - MurmurHash - CityHash - FarmHash
- Parameters:
rdf_id (str) – The string representation of an RDF Resource
- Returns:
The ArangoDB _key equivalent of rdf_id
- Return type:
str
- rdf_id_to_adb_label(rdf_id: str) str [source]
RDF -> ArangoDB: Return the suffix of an RDF URI.
The suffix can (1) be used as an ArangoDB Collection name, or (2) be used as the _label property value for an ArangoDB Document.
For example: - http://example.com/Person -> “Person” - http://example.com/Person#Bob -> “Bob” - http://example.com/Person:Bob -> “Bob”
- Parameters:
rdf_id (str) – The string representation of a URIRef
- Returns:
The suffix of the RDF URI string
- Return type:
str
- extract_adb_col_statements(rdf_graph: Graph, keep_adb_col_statements_in_rdf_graph: bool = False) Graph [source]
ArangoDB <-> RDF: Extracts adb:collection statements from an RDF Graph.
- Parameters:
rdf_graph (rdflib.graph.Graph) – The RDF Graph to extract the statements from.
keep_adb_col_statements_in_rdf_graph (bool) – Keeps the ArangoDB Collection statements in the original graph once extracted. Defaults to False.
- Returns:
The ArangoDB Collection Mapping graph.
- Return type:
rdflib.graph.Graph
- extract_adb_key_statements(rdf_graph: Graph, keep_adb_key_statements_in_rdf_graph: bool = False) Graph [source]
ArangoDB <-> RDF: Extracts the adb:key statements from an RDF Graph.
- Parameters:
rdf_graph (rdflib.graph.Graph) – The RDF Graph to extract the statements from.
keep_adb_col_statements_in_rdf_graph (bool) – Keeps the ArangoDB Collection Mapping statements in the original graph once extracted. Defaults to False.
- Returns:
The ArangoDB Collection Mapping graph.
- Return type:
rdflib.graph.Graph
ArangoRDFController
- class arango_rdf.controller.ArangoRDFController[source]
Controller used in RDF-to-ArangoDB (PGT).
Responsible for handling how the ArangoDB Collection Mapping Process identifies the “ideal RDFS Class” among a selection of RDFS Classes for a given RDF Resource.
The “ideal RDFS Class” is defined as an RDFS Class whose local name best represents the RDF Resource in question. This local name will be used as the ArangoDB Collection name that will store rdf_resource.
Read more about how the PGT ArangoDB Collection Mapping Process works here.
- identify_best_class(rdf_resource: URIRef | BNode | Literal, class_set: Set[str], subclass_tree: Tree) str [source]
Find the ideal RDFS Class among a selection of RDFS Classes. Essential for the ArangoDB Collection Mapping Process used in RDF-to-ArangoDB (PGT).
Read more about how the PGT ArangoDB Collection Mapping Process works here.
The “ideal RDFS Class” is defined as an RDFS Class whose local name best represents the RDF Resource in question. This local name will be used as the ArangoDB Collection name that will store rdf_resource.
This system is a work-in-progress. Users are welcome to overwrite this method via their own implementation of the ArangoRDFController Class. Users are able to access the RDF Graph of the current RDF-to-ArangoDB transformation via self.rdf_graph, and the database instance via the self.db.
- Parameters:
rdf_resource (URIRef | BNode) – The RDF Resource in question.
class_set (Set[str]) – A set of RDFS Class URIs that are associated to rdf_resource via the RDF.Type relationship, either via explicit definition or via domain/range inference.
subclass_tree (arango_rdf.utils.Tree) – The Tree data structure representing the RDFS subClassOf Taxonomy. See
arango_rdf.main.ArangoRDF.__build_subclass_tree()
for more info.
- Returns:
The string representation of the URI of the most suitable RDFS Class URI among the set of RDFS Classes to use as the ArangoDB Document Collection name for rdf_resource.
- Return type:
str