API Specification

This page contains the specification for all classes and methods available in ArangoRDF.

ArangoRDF

class arango_rdf.main.ArangoRDF(db: ~arango.database.StandardDatabase, controller: ~arango_rdf.controller.ArangoRDFController = <arango_rdf.controller.ArangoRDFController object>, logging_lvl: str | int = 20)[source]

ArangoRDF: Transform RDF Graphs into ArangoDB Graphs & vice-versa.

Implemented using concepts referred in https://arxiv.org/abs/2210.05781.

Parameters:
  • db (arango.database.Database) – A python-arango database instance

  • logging_lvl (str | int) – Defaults to logging.INFO. Other useful options are logging.DEBUG (more verbose), and logging.WARNING (less verbose).

Raises:

TypeError – On invalid parameter types

arangodb_to_rdf(name: str, rdf_graph: Graph, metagraph: Dict[str, Dict[str, Set[str]]], explicit_metagraph: bool = True, list_conversion_mode: str = 'static', dict_conversion_mode: str = 'static', infer_type_from_adb_v_col: bool = False, include_adb_v_col_statements: bool = False, include_adb_v_key_statements: bool = False, include_adb_e_key_statements: bool = False, **adb_export_kwargs: Any) Graph[source]

Create an RDF Graph from an ArangoDB Graph via its Metagraph.

Parameters:
  • name (str) – The name of the ArangoDB Graph

  • rdf_graph (rdflib.graph.Graph) – The target RDF Graph to insert into.

  • metagraph (arango_rdf.typings.ADBMetagraph) – An dictionary of dictionaries defining the ArangoDB Vertex & Edge Collections whose entries will be inserted into the RDF Graph.

  • explicit_metagraph (bool) – Only keep the document attributes specified in metagraph when importing to RDF (is True by default). Otherwise, all document attributes are included. Defaults to True.

  • list_conversion_mode (str) – Specify how ArangoDB JSON lists within and ArangoDB Document are processed into the RDF Graph. If “serialize”, JSON Objects will be serialized into RDF Literals. If “collection”, ArangoDB lists will be processed using the RDF Collection structure. If “container”, ArangoDB lists will be processed using the RDF Container structure. If “static”, elements within lists will be processed as individual statements. Defaults to “static”. NOTE: “serialize” is recommended if round-tripping is desired, but only if round-tripping via PGT.

  • dict_conversion_mode (str) – Specify how ArangoDB JSON Objects within an ArangoDB Document are processed into the RDF Graph. If “serialize”, JSON Objects will be serialized into RDF Literals. If “static”, elements within dictionaries will be processed as individual statements with the help of BNodes. Defaults to “static”. NOTE: “serialize” is recommended if round-tripping is desired, but only if round-tripping via PGT.

  • infer_type_from_adb_v_col (bool) – Specify whether rdf:type statements of the form resource rdf:type adb_v_col . should be inferred upon transferring ArangoDB Vertices into RDF.

  • include_adb_v_col_statements (bool) – Specify whether adb:collection statements of the form adb_vertex adb:collection adb_v_col . should be generated upon transferring ArangoDB Documents into RDF. This can be used to maintain document collections when a user is interested in round-tripping.

  • include_adb_v_key_statements (bool) – Specify whether adb:key statements of the form adb_vertex adb:key adb_vertex[“key”] . should be generated upon transferring ArangoDB Documennts into RDF. This can be used to maintain document keys when a user is interested in round-tripping.

  • include_adb_e_key_statements (bool) – Specify whether adb:key statements of the form adb_edge adb:key adb_edge[“key”] . should be generated upon transferring ArangoDB Edges into RDF. This can be used to maintain edge keys when a user is interested in round-tripping. NOTE: Enabling this option will impose Triple Reification on all ArangoDB Edges.

  • adb_export_kwargs (Any) – Keyword arguments to specify AQL query options when fetching documents from the ArangoDB instance. Full parameter list: https://docs.python-arango.com/en/main/specs.html#arango.aql.AQL.execute

Returns:

The RDF representation of the ArangoDB Graph.

Return type:

rdflib.graph.Graph

arangodb_collections_to_rdf(name: str, rdf_graph: Graph, v_cols: Set[str], e_cols: Set[str], list_conversion_mode: str = 'static', dict_conversion_mode: str = 'static', infer_type_from_adb_v_col: bool = False, include_adb_v_col_statements: bool = False, include_adb_v_key_statements: bool = False, include_adb_e_key_statements: bool = False, **adb_export_kwargs: Any) Graph[source]

Create an RDF Graph from an ArangoDB Graph via its Collection Names.

Parameters:
  • name (str) – The name of the ArangoDB Graph

  • rdf_graph (rdflib.graph.Graph) – The target RDF Graph to insert into.

  • v_cols (Set[str]) – The set of ArangoDB Vertex Collections to import to RDF.

  • e_cols (Set[str]) – The set of ArangoDB Edge Collections to import to RDF.

  • list_conversion_mode (str) – Specify how ArangoDB JSON lists within and ArangoDB Document are processed into the RDF Graph. If “serialize”, JSON Objects will be serialized into RDF Literals. If “collection”, ArangoDB lists will be processed using the RDF Collection structure. If “container”, ArangoDB lists will be processed using the RDF Container structure. If “static”, elements within lists will be processed as individual statements. Defaults to “static”. NOTE: “serialize” is recommended if round-tripping is desired, but only if round-tripping via PGT.

  • dict_conversion_mode (str) – Specify how ArangoDB JSON Objects within an ArangoDB Document are processed into the RDF Graph. If “serialize”, JSON Objects will be serialized into RDF Literals. If “static”, elements within dictionaries will be processed as individual statements with the help of BNodes. Defaults to “static”. NOTE: “serialize” is recommended if round-tripping is desired, but only if round-tripping via PGT.

  • infer_type_from_adb_v_col (bool) – Specify whether rdf:type statements of the form resource rdf:type adb_v_col . should be inferred upon transferring ArangoDB Vertices into RDF.

  • include_adb_v_col_statements (bool) – Specify whether adb:collection statements of the form adb_vertex adb:collection adb_v_col . should be generated upon transferring ArangoDB Documents into RDF. This can be used to maintain document collections when a user is interested in round-tripping.

  • include_adb_v_key_statements (bool) – Specify whether adb:key statements of the form adb_vertex adb:key adb_vertex[“key”] . should be generated upon transferring ArangoDB Documennts into RDF. This can be used to maintain document keys when a user is interested in round-tripping.

  • include_adb_e_key_statements (bool) – Specify whether adb:key statements of the form adb_edge adb:key adb_edge[“key”] . should be generated upon transferring ArangoDB Edges into RDF. This can be used to maintain edge keys when a user is interested in round-tripping. NOTE: Enabling this option will impose Triple Reification on all ArangoDB Edges.

  • adb_export_kwargs (Any) – Keyword arguments to specify AQL query options when fetching documents from the ArangoDB instance. Full parameter list: https://docs.python-arango.com/en/main/specs.html#arango.aql.AQL.execute

Returns:

The RDF representation of the ArangoDB Graph.

Return type:

rdflib.graph.Graph

arangodb_graph_to_rdf(name: str, rdf_graph: Graph, list_conversion_mode: str = 'static', dict_conversion_mode: str = 'static', infer_type_from_adb_v_col: bool = False, include_adb_v_col_statements: bool = False, include_adb_v_key_statements: bool = False, include_adb_e_key_statements: bool = False, **adb_export_kwargs: Any) Graph[source]

Create an RDF Graph from an ArangoDB Graph via its Graph Name.

Parameters:
  • name (str) – The name of the ArangoDB Graph

  • rdf_graph (rdflib.graph.Graph) – The target RDF Graph to insert into.

  • list_conversion_mode (str) – Specify how ArangoDB JSON lists within and ArangoDB Document are processed into the RDF Graph. If “serialize”, JSON Objects will be serialized into RDF Literals. If “collection”, ArangoDB lists will be processed using the RDF Collection structure. If “container”, ArangoDB lists will be processed using the RDF Container structure. If “static”, elements within lists will be processed as individual statements. Defaults to “static”. NOTE: “serialize” is recommended if round-tripping is desired, but only if round-tripping via PGT.

  • dict_conversion_mode (str) – Specify how ArangoDB JSON Objects within an ArangoDB Document are processed into the RDF Graph. If “serialize”, JSON Objects will be serialized into RDF Literals. If “static”, elements within dictionaries will be processed as individual statements with the help of BNodes. Defaults to “static”. NOTE: “serialize” is recommended if round-tripping is desired, but only if round-tripping via PGT.

  • infer_type_from_adb_v_col (bool) – Specify whether rdf:type statements of the form resource rdf:type adb_v_col . should be inferred upon transferring ArangoDB Vertices into RDF.

  • include_adb_v_col_statements (bool) – Specify whether adb:collection statements of the form adb_vertex adb:collection adb_v_col . should be generated upon transferring ArangoDB Documents into RDF. This can be used to maintain document collections when a user is interested in round-tripping.

  • include_adb_v_key_statements (bool) – Specify whether adb:key statements of the form adb_vertex adb:key adb_vertex[“key”] . should be generated upon transferring ArangoDB Documennts into RDF. This can be used to maintain document keys when a user is interested in round-tripping.

  • include_adb_e_key_statements (bool) – Specify whether adb:key statements of the form adb_edge adb:key adb_edge[“key”] . should be generated upon transferring ArangoDB Edges into RDF. This can be used to maintain edge keys when a user is interested in round-tripping. NOTE: Enabling this option will impose Triple Reification on all ArangoDB Edges.

  • adb_export_kwargs (Any) – Keyword arguments to specify AQL query options when fetching documents from the ArangoDB instance. Full parameter list: https://docs.python-arango.com/en/main/specs.html#arango.aql.AQL.execute

Returns:

The RDF representation of the ArangoDB Graph.

Return type:

rdflib.graph.Graph

rdf_to_arangodb_by_rpt(name: str, rdf_graph: Graph, contextualize_graph: bool = False, flatten_reified_triples: bool = True, use_hashed_literals_as_keys: bool = True, overwrite_graph: bool = False, batch_size: int | None = None, **adb_import_kwargs: Any) Graph[source]

Create an ArangoDB Graph from an RDF Graph using the RDF-topology Preserving Transformation (RPT) Algorithm.

RPT preserves the RDF Graph structure by transforming each RDF statement into a Property Graph Edge. More info on RPT can be found in the package’s README file, or in the following paper: https://arxiv.org/pdf/2210.05781.pdf.

This method will store the RDF Resources of rdf_graph under the following ArangoDB Collections:

  1. {Name}_URIRef: Vertex collection for rdflib.term.URIRef resources.

  2. {Name}_BNode: Vertex collection for rdflib.term.BNode resources.

  3. {Name}_Literal: Vertex collection for rdflib.term.Literal resources.

  4. {Name}_Statement: Edge collection for all triples/quads.

Parameters:
  • name (str) – The name of the RDF Graph

  • rdf_graph – The RDF Graph object. NOTE: This object is modified in-place in order for PGT to work. Do not expect the original state of rdf_graph to be preserved.

  • contextualize_graph (bool) –

    A work-in-progress flag that seeks to enhance the Terminology Box of rdf_graph by providing the following features:

    1. Loading Meta Ontologies (i.e OWL, RDF, RDFS, etc.) into the RDF Graph

    2. Providing Domain & Range Inference

    3. Providing Domain & Range Introspection

  • flatten_reified_triples (bool) – If set to False, will preserve the RDF structure of reified triples. If set to True, will convert any reified triple into a “regular” Property Graph Edge. Defaults to True.

  • use_hashed_literals_as_keys (bool) – If set to False, will not use the hashed value of an RDF Literal as its ArangoDB Document Key (i.e a randomly-generated key will instead be used). If set to True, all RDF Literals with the same value will be represented as one single ArangoDB Document. Defaults to True.

  • overwrite_graph (bool Defaults to False.) – Overwrites the ArangoDB graph identified by name if it already exists, and drops its associated collections. Defaults to False.

  • batch_size (int | None) – If specified, runs the ArangoDB Data Ingestion process for every batch_size RDF triples/quads within rdf_graph. Defaults to len(rdf_graph).

  • adb_import_kwargs – Keyword arguments to specify additional parameters for ArangoDB document insertion. Full parameter list: https://docs.python-arango.com/en/main/specs.html#arango.collection.Collection.import_bulk

  • adb_import_kwargs – Any

Type:

rdf_graph: rdflib.graph.Graph

Returns:

The ArangoDB Graph API wrapper.

Return type:

arango.graph.Graph

rdf_to_arangodb_by_pgt(name: str, rdf_graph: Graph, adb_col_statements: Graph | None = None, write_adb_col_statements: bool = True, contextualize_graph: bool = False, flatten_reified_triples: bool = True, overwrite_graph: bool = False, batch_size: int | None = None, **adb_import_kwargs: Any) Graph[source]

Create an ArangoDB Graph from an RDF Graph using the Property Graph Transformation (PGT) Algorithm.

PGT ensures that datatype property statements (i.e statements whose objects are Literals) are mapped to document properties in the Property Graph. Learn more about PGT here.

Contrary to RPT, this method will rely on the nature of the RDF Resource/Statement to determine which ArangoDB Collection it belongs to. This process is referred to as the ArangoDB Collection Mapping Process. Learn more about the PGT ArangoDB Collection Mapping Process here.

Contrary to RPT, regardless of whether contextualize_graph is set to True or not, all RDF Predicates within every RDF Statement in rdf_graph will be processed as their own ArangoDB Document, and will be stored under the “Property” Vertex Collection.

Parameters:
  • name (str) – The name of the RDF Graph

  • rdf_graph – The RDF Graph object. NOTE: This object is modified in-place in order for PGT to work. Do not expect the original state of rdf_graph to be preserved.

  • adb_col_statements (rdflib.graph.Graph | None) – An optional RDF Graph containing ArangoDB Collection statements of the form adb_vertex http://arangodb/collection “adb_v_col” .. Useful for creating a custom ArangoDB Collection mapping of RDF Resources within rdf_graph. Defaults to None. NOTE: Cannot be used in conjunction with collection statements in rdf_graph.

  • write_adb_col_statements (bool) – Run the ArangoDB Collection Mapping Process for rdf_graph to write the ArangoDB Collection statements of the form adb_vertex http://arangodb/collection “adb_v_col” . ` into **adb_col_statements**. This parameter is ignored if **contextualize_graph** is set to True, as the ArangoDB Collection Mapping Process is required for Graph Contextualization. See :func:`write_adb_col_statements for more information.

  • contextualize_graph (bool) –

    A work-in-progress flag that seeks to enhance the Terminology Box of rdf_graph by providing the following features:

    1. Loading Meta Ontologies (i.e OWL, RDF, RDFS, etc.) into the RDF Graph

    2. Providing Domain & Range Inference

    3. Providing Domain & Range Introspection

  • flatten_reified_triples (bool) –

    If set to False, will preserve the RDF structure of any Reified Triple. If set to True, will “flatten” any reified triples into a regular Property Graph Edge. Defaults to True.

    Learn more about Triple Reification here.

  • overwrite_graph (bool) – Overwrites the ArangoDB graph identified by name if it already exists, and drops its associated collections. Defaults to False.

  • batch_size (int | None) – If specified, runs the ArangoDB Data Ingestion process for every batch_size RDF triples/quads within rdf_graph. Defaults to None.

  • adb_import_kwargs – Keyword arguments to specify additional parameters for the ArangoDB Data Ingestion process. The full parameter list is here. # noqa: E501

Type:

rdf_graph: rdflib.graph.Graph

Returns:

The ArangoDB Graph API wrapper.

Return type:

arango.graph.Graph

write_adb_col_statements(rdf_graph: Graph, adb_col_statements: Graph | None = None) Graph[source]

RDF -> ArangoDB (PGT): Run the ArangoDB Collection Mapping Process for rdf_graph to map RDF Resources to their respective ArangoDB Collection.

The PGT Algorithm relies on the ArangoDB Collection Mapping Process to identify the ArangoDB Collection of every RDF Resource. Using this method prior to running rdf_to_arangodb_by_pgt() allows you to visualize and modify the mapping. Learn more about the PGT ArangoDB Collection Mapping Process here.

NOTE: Running this method prior to rdf_to_arangodb_by_pgt() is unnecessary if the user is not interested in viewing/modifying the ArangoDB Mapping.

NOTE: There can only be 1 adb:collection statement associated to each RDF Resource.

Parameters:
  • rdf_graph (rdflib.graph.Graph) – The RDF Graph object.

  • adb_col_statements (Optional[rdflib.graph.Graph]) – An existing RDF Graph containing adb:collection statements. If not provided, a new RDF Graph will be created. Defaults to None. NOTE: The ArangoDB Collection Mapping Process relies heavily on mapping certain RDF Resources to the “Class” and “Property” ArangoDB Collections. Therefore, it is currently not possible to overwrite any RDF Resources that belong to these collections.

rdf_id_to_adb_key(rdf_id: str, rdf_term: URIRef | BNode | Literal | None = None) str[source]

RDF -> ArangoDB: Convert an RDF Resource ID string into an ArangoDB Key via some hashing function.

If rdf_term is provided, then the value of the statement rdf_term adb:key “<ArangoDB Document Key>” . will be used as the ArangoDB Key (assuming that said statement exists).

Current hashing function used: FarmHash

Parameters:
  • rdf_id (str) – The string representation of an RDF Resource

  • rdf_term (Optional[URIRef | BNode | Literal]) – The optional RDF Term to check if it has an adb:key statement associated to it.

Returns:

The ArangoDB _key equivalent of rdf_id

Return type:

str

hash(rdf_id: str) str[source]

RDF -> ArangoDB: Hash an RDF Resource ID string into an ArangoDB Key via some hashing function.

Current hashing function used: FarmHash

List of hashing functions tested & benchmarked: - Built-in hash() function - Hashlib MD5 - xxHash - MurmurHash - CityHash - FarmHash

Parameters:

rdf_id (str) – The string representation of an RDF Resource

Returns:

The ArangoDB _key equivalent of rdf_id

Return type:

str

rdf_id_to_adb_label(rdf_id: str) str[source]

RDF -> ArangoDB: Return the suffix of an RDF URI.

The suffix can (1) be used as an ArangoDB Collection name, or (2) be used as the _label property value for an ArangoDB Document.

For example: - http://example.com/Person -> “Person” - http://example.com/Person#Bob -> “Bob” - http://example.com/Person:Bob -> “Bob”

Parameters:

rdf_id (str) – The string representation of a URIRef

Returns:

The suffix of the RDF URI string

Return type:

str

extract_adb_col_statements(rdf_graph: Graph, keep_adb_col_statements_in_rdf_graph: bool = False) Graph[source]

ArangoDB <-> RDF: Extracts adb:collection statements from an RDF Graph.

Parameters:
  • rdf_graph (rdflib.graph.Graph) – The RDF Graph to extract the statements from.

  • keep_adb_col_statements_in_rdf_graph (bool) – Keeps the ArangoDB Collection statements in the original graph once extracted. Defaults to False.

Returns:

The ArangoDB Collection Mapping graph.

Return type:

rdflib.graph.Graph

extract_adb_key_statements(rdf_graph: Graph, keep_adb_key_statements_in_rdf_graph: bool = False) Graph[source]

ArangoDB <-> RDF: Extracts the adb:key statements from an RDF Graph.

Parameters:
  • rdf_graph (rdflib.graph.Graph) – The RDF Graph to extract the statements from.

  • keep_adb_col_statements_in_rdf_graph (bool) – Keeps the ArangoDB Collection Mapping statements in the original graph once extracted. Defaults to False.

Returns:

The ArangoDB Collection Mapping graph.

Return type:

rdflib.graph.Graph

ArangoRDFController

class arango_rdf.controller.ArangoRDFController[source]

Controller used in RDF-to-ArangoDB (PGT).

Responsible for handling how the ArangoDB Collection Mapping Process identifies the “ideal RDFS Class” among a selection of RDFS Classes for a given RDF Resource.

The “ideal RDFS Class” is defined as an RDFS Class whose local name best represents the RDF Resource in question. This local name will be used as the ArangoDB Collection name that will store rdf_resource.

Read more about how the PGT ArangoDB Collection Mapping Process works here.

identify_best_class(rdf_resource: URIRef | BNode | Literal, class_set: Set[str], subclass_tree: Tree) str[source]

Find the ideal RDFS Class among a selection of RDFS Classes. Essential for the ArangoDB Collection Mapping Process used in RDF-to-ArangoDB (PGT).

Read more about how the PGT ArangoDB Collection Mapping Process works here.

The “ideal RDFS Class” is defined as an RDFS Class whose local name best represents the RDF Resource in question. This local name will be used as the ArangoDB Collection name that will store rdf_resource.

This system is a work-in-progress. Users are welcome to overwrite this method via their own implementation of the ArangoRDFController Class. Users are able to access the RDF Graph of the current RDF-to-ArangoDB transformation via self.rdf_graph, and the database instance via the self.db.

Parameters:
  • rdf_resource (URIRef | BNode) – The RDF Resource in question.

  • class_set (Set[str]) – A set of RDFS Class URIs that are associated to rdf_resource via the RDF.Type relationship, either via explicit definition or via domain/range inference.

  • subclass_tree (arango_rdf.utils.Tree) – The Tree data structure representing the RDFS subClassOf Taxonomy. See arango_rdf.main.ArangoRDF.__build_subclass_tree() for more info.

Returns:

The string representation of the URI of the most suitable RDFS Class URI among the set of RDFS Classes to use as the ArangoDB Document Collection name for rdf_resource.

Return type:

str