Refactoring rudof #212

angelip2303 · 2024-11-05T10:44:58Z

angelip2303
Nov 5, 2024
Maintainer

We have been discussing about the possibility of rethinking the internals of rudof for making it a future-proof Rust-based RDF library. The idea then would be not only to reimplement the model but to formalize the steps followed. Thus, we could design a methodology for implementing RDF in Rust.

By the time now we have been focusing on delivering; that is, obtaining a usable tool. However, the scope has changed, from fast prototyping to a more stable implementation. Thus, a review of the model is required. This idea started with the technical debt that was detected when implementing the rudof_lib module (#200 and #201). As an example, in the case of the SHACL validation, during some benchmarks (#206), we could find that from the total time spent for validating a data graph against a shapes graph, the system spent 16.13% of it cloning and a total of 30.45% of the time compiling the shapes. To put it into perspective, only 44.18% of the time was spent in the validation itself. Refer to Figure 1 for more details.

Figure 1. Flamegraph corresponding to the SHACL validation based on rudof_lib.

The components to be changed

The SRDF model

We have detected that the clones come from this part of the codebase. Not only that but, even if the Trait-based design has proved to be a good idea, both the naming conventions and the functionality provided by those traits is a bit confusing. We should stick to the Single-responsability principle. What's more, some of the methods defined depend directly on Oxigraph, losing the inherent genericity of the traits. It is also required to simplify the API of some of the methods (see the helper functions defined in the SHACL validation). Refer to Figures 2 and 3 for examples of the design proposed.

Figure 2. Proposed architecture which is a simplification of the current one. The idea is that SRDF should be a module that is as generic as possible. Possibly reusable across several other libraries.

Figure 3. Proposed design of the SRDF model. The idea would be to have an inner representation of RDF and a set of traits for implementing the top-level features; e.g SHACL validation, ShEx validation...

The ShEx and SHACL implementation

The idea of introducing functional parser combinators is well-suited to Rust and aligns closely with the modular nature of both ShEx and SHACL, where many components exhibit similar behavior. However, validation requires more than just syntactic analysis (parsing); it also involves creating a native representation of shapes, performing type validation, resolving imports, and more. This additional layer corresponds to semantic analysis in compiler design.

Figure 4. Proposed design of the shapes compiler.

Strong external dependencies

Right now, sparql_service depends on Oxigraph. Not only that, but sparql_service seems to duplicate functionality (SRDFGraph and SRDFQuery). In shacl_validation, I implemented the store package to simplify the low-level interface of stores, Graph and Endpoint. Additionally, I think Oxigraph introduces a very strong external dependency, and I’m wondering if it would be better to implement this at a lower level (perhaps using DuckDB).

Conclusions

As we have said, we believe that rudof has clearly surpased its initial scope, and has proved to be useful. We have also checked that Rust is a fantastic language for building fast tools on top of RDF. Maybe it would be fine to think about the possibility of refactoring the architecture of the tool.

angelip2303 · 2024-11-05T10:46:57Z

angelip2303
Nov 5, 2024
Maintainer Author

As the initial steps before committing to the new architecture, I think we should focus on:

SHACL and ShEx testsuites should be moved to the CI (tests module in the root).
- SHACL
- ShEX
The currently implemented features should be tested (unitary tests) so we can tell that the changes introduced won't break the tool.

0 replies

angelip2303 · 2024-11-07T13:49:21Z

angelip2303
Nov 7, 2024
Maintainer Author

I am trying to implement the SHACL-based subsetting algorithm (#211), and I found that the SRDF trait could include more functionality that is interesting for working with graphs. The idea is that by implementing a triples_matching method which can resolve Basic Graph Patterns, all the other implementations can be understood as specializations of the matcher.

pub struct Triples<R: Rdf>(Vec<R::Triple>); // iterator of triples

pub trait RdfGraph: Rdf {
    pub fn triples_matching(
        &self,
        subject: Option<&Self::Subject>,
        predicate: Option<&Self::IRI>,
        object: Option<&Self::Term>
    ) -> Result<Triple<Self>, Self::Error>;

    pub fn triples(&self) -> Result<Triples<Self>, Self::Error> {
        triples_matching(None, None, None)
    }

    pub fn subjects<'a>(&self) -> Result<Box<dyn Iterator<Item = (&'a Self::Subject)> + 'a>, Self::Error> {
        SubjectIteration::iterate(self.triples())
    }

    pub fn predicates<'a>(&self) -> Result<Box<dyn Iterator<Item = (&'a Self::IRI)> + 'a>, Self::Error> {
        PredicateIteration::iterate(self.triples())
    }

    pub fn objects<'a>(&self) -> Result<Box<dyn Iterator<Item = (&'a Self::Term)> + 'a>, Self::Error> {
        ObjectIteration::iterate(self.triples())
    }

    pub fn triples_with_subject(&self, subject: &Self::Subject) -> Result<Triples<Self>, Self::Error> {
        triples_matching(Some(subject), None, None)
    }

    pub fn triples_with_predicate(&self, predicate: &Self::IRI) -> Result<Triples<Self>, Self::Error> {
        triples_matching(None, Some(predicate), None)
    }

    pub fn triples_with_object(&self, object: &Self::Term) -> Result<Triples<Self>, Self::Error> {
        triples_matching(None, None, Some(object))
    }
    
    pub fn neighs(&self, node: &S::Term) -> Result<Triples<Self>, Self::Error> {
        let subject = S::term_as_subject(node)?;
        triples_with_subject(subject)
    }
}

// For the methods to obtain only subjects, or predicates, or objects, one can
// define several iteration strategies over triples for retrieving only subjects,
// or predicates, or objects
pub trait IterationStrategy<R: Rdf> {
    type Item;

    fn iterate<'a>(
        &'a self,
        triples: &'a Triples<R>,
    ) -> Box<dyn Iterator<Item = (&'a Self::Item)> + 'a>;
}

pub struct SubjectIteration;

impl<R: Rdf> IterationStrategy<R> for SubjectIteration {
    type Item = R::Subject;

    fn iterate<'a>(
        &'a self,
        triples: &'a Triples<R>,
    ) -> Box<dyn Iterator<Item = (&'a Self::Item)> + 'a> {
        todo!()
    }
}

pub struct PredicateIteration;

impl<R: Rdf> IterationStrategy<R> for PredicateIteration {
    type Item = R::IRI;

    fn iterate<'a>(
        &'a self,
        triples: &'a Triples<R>,
    ) -> Box<dyn Iterator<Item = (&'a Self::Item)> + 'a> {
        todo!()
    }
}

pub struct ObjectIteration;

impl<R: Rdf> IterationStrategy<R> for ObjectIteration {
    type Item = R::Term;

    fn iterate<'a>(
        &'a self,
        triples: &'a Triples<R>,
    ) -> Box<dyn Iterator<Item = (&'a Self::Item)> + 'a> {
        todo!()
    }
}

Note that the snippet above should be understood as pseudocode, and it may not work as-is.

0 replies

angelip2303 · 2024-11-08T09:53:05Z

angelip2303
Nov 8, 2024
Maintainer Author

I think that a better name for the trait above would be Graph, as it allows implementors to be Graph-like structures.

Apart from that, I don't really know if the Triple struct and the SRDFBasic trait are as optimized as they should. First, we are extending the SRDFBasic trait here because we want to take advantage o the associated types that are annotated there. However, I don't really know if that makes sense. In Rust, if you extend a trait, you require implementors to also implement the other trait, which makes sense in an abstract sense of things for a Graph to also implement basic RDF-related operations. However, if you dive into the SRDFBasic trait, you will see that it encapsulates comparisons and conversions of RDF terms. In my view, it possibly makes more sense to separate those from the SRDFBasic trait, and reserve it for creating RDF nodes. An example can be found in the snippet below:

// The idea behind this trait is to define the basic capabilities of any RDF-based
// data structure. From an SPARQL-endpoint to a Graph, all the implementators of
// the Rdf trait will have some basic functionality: add or remove a triple,
// add a base, add a prefix...
pub trait Rdf {
    type Subject; // subject
    type IRI; // predicate
    type Term; // object
    type BNode;
    type Literal;
    type Triple; // rdf-star

    type Error;

    pub fn add_triple(
        &self,
        subject: &Self::Subject,
        predicate: &Self::IRI,
        object: &Self::Term
    ) -> Result<(), Self::Error>;

    pub fn remove_triple(
        &self,
        subject: &Self::Subject,
        predicate: &Self::IRI,
        object: &Self::Term
    ) -> Result<(), Self::Error>;

    fn add_base(&mut self, base: &Self::IRI>) -> Result<(), Self::Error>;

    fn add_prefix(&mut self, alias: &str, iri: &Self::IRI) -> Result<(), Self::Error>;
}

pub trait TermConversion: Rdf {
    fn subject_as_iri(subject: &Self::Subject) -> Option<&Self::IRI>;
    fn subject_as_bnode(subject: &Self::Subject) -> Option<&Self::BNode>;
    fn term_as_iri(object: &Self::Term) -> Option<&Self::IRI>;
    fn term_as_bnode(object: &Self::Term) -> Option<&Self::BNode>;
    fn term_as_literal(object: &Self::Term) -> Option<&Self::Literal>;
    fn term_as_triple(object: &Self::Term) -> Option<&Self::Triple>;
}

pub trait TermComparison: Rdf {
    fn subject_is_iri(subject: &Self::Subject) -> bool;
    fn subject_is_bnode(subject: &Self::Subject) -> bool;
    fn term_is_iri(object: &Self::Term) -> bool;
    fn term_is_bnode(object: &Self::Term) -> bool;
    fn term_is_literal(object: &Self::Term) -> bool;
    fn term_is_triple(object: &Self::Term) -> bool;
}

0 replies

angelip2303 · 2024-11-08T11:06:53Z

angelip2303
Nov 8, 2024
Maintainer Author

One of the remaining pieces is the SPARQL support, which can be used, for example, in the SHACL validation for implemeting the SPARQL-based engine. Thus, a handful of methods is required for this purpose. The snippet below is an example of this:

pub trait Sparql: Rdf {
    type QuerySolution;

    fn execute(
        &self,
        prefixmap: Vec<Self::IRI>,
        query: &str
    ) -> Result<Vec<QuerySolution>, Self::Error>;

    pub fn select(
        &self,
        prefixmap: Vec<Self::IRI>,
        query: &str
    ) -> Result<Vec<QuerySolution>, Self:Error> {
        // TODO: check is a select query?
        self.execute(prefixmap, query)?
    }

    pub fn ask(
        &self,
        prefixmap: Vec<Self::IRI>,
        query: &str
    ) -> Result<bool, Self:Error> {
        // TODO: check is an ask query?
        todo!()
    }

    pub fn construct(
        &self,
        prefixmap: Vec<Self::IRI>,
        query: &str
    ) -> Result<(), Self:Error> {
        // TODO: check is a construct query?
        todo!()
    }

    pub fn update(
        &self,
        prefixmap: Vec<Self::IRI>,
        query: &str
    ) -> Result<(), Self:Error> {
        // TODO: check is an update query?
        todo!()
    }
}

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactoring rudof #212

{{title}}

Replies: 4 comments

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

Refactoring rudof #212

angelip2303 Nov 5, 2024 Maintainer

The components to be changed

The SRDF model

The ShEx and SHACL implementation

Strong external dependencies

Conclusions

Replies: 4 comments

angelip2303 Nov 5, 2024 Maintainer Author

angelip2303 Nov 7, 2024 Maintainer Author

angelip2303 Nov 8, 2024 Maintainer Author

angelip2303 Nov 8, 2024 Maintainer Author

angelip2303
Nov 5, 2024
Maintainer

angelip2303
Nov 5, 2024
Maintainer Author

angelip2303
Nov 7, 2024
Maintainer Author

angelip2303
Nov 8, 2024
Maintainer Author

angelip2303
Nov 8, 2024
Maintainer Author