Refactoring rudof #212
Replies: 4 comments
-
As the initial steps before committing to the new architecture, I think we should focus on:
|
Beta Was this translation helpful? Give feedback.
-
I am trying to implement the SHACL-based subsetting algorithm (#211), and I found that the SRDF trait could include more functionality that is interesting for working with graphs. The idea is that by implementing a pub struct Triples<R: Rdf>(Vec<R::Triple>); // iterator of triples
pub trait RdfGraph: Rdf {
pub fn triples_matching(
&self,
subject: Option<&Self::Subject>,
predicate: Option<&Self::IRI>,
object: Option<&Self::Term>
) -> Result<Triple<Self>, Self::Error>;
pub fn triples(&self) -> Result<Triples<Self>, Self::Error> {
triples_matching(None, None, None)
}
pub fn subjects<'a>(&self) -> Result<Box<dyn Iterator<Item = (&'a Self::Subject)> + 'a>, Self::Error> {
SubjectIteration::iterate(self.triples())
}
pub fn predicates<'a>(&self) -> Result<Box<dyn Iterator<Item = (&'a Self::IRI)> + 'a>, Self::Error> {
PredicateIteration::iterate(self.triples())
}
pub fn objects<'a>(&self) -> Result<Box<dyn Iterator<Item = (&'a Self::Term)> + 'a>, Self::Error> {
ObjectIteration::iterate(self.triples())
}
pub fn triples_with_subject(&self, subject: &Self::Subject) -> Result<Triples<Self>, Self::Error> {
triples_matching(Some(subject), None, None)
}
pub fn triples_with_predicate(&self, predicate: &Self::IRI) -> Result<Triples<Self>, Self::Error> {
triples_matching(None, Some(predicate), None)
}
pub fn triples_with_object(&self, object: &Self::Term) -> Result<Triples<Self>, Self::Error> {
triples_matching(None, None, Some(object))
}
pub fn neighs(&self, node: &S::Term) -> Result<Triples<Self>, Self::Error> {
let subject = S::term_as_subject(node)?;
triples_with_subject(subject)
}
}
// For the methods to obtain only subjects, or predicates, or objects, one can
// define several iteration strategies over triples for retrieving only subjects,
// or predicates, or objects
pub trait IterationStrategy<R: Rdf> {
type Item;
fn iterate<'a>(
&'a self,
triples: &'a Triples<R>,
) -> Box<dyn Iterator<Item = (&'a Self::Item)> + 'a>;
}
pub struct SubjectIteration;
impl<R: Rdf> IterationStrategy<R> for SubjectIteration {
type Item = R::Subject;
fn iterate<'a>(
&'a self,
triples: &'a Triples<R>,
) -> Box<dyn Iterator<Item = (&'a Self::Item)> + 'a> {
todo!()
}
}
pub struct PredicateIteration;
impl<R: Rdf> IterationStrategy<R> for PredicateIteration {
type Item = R::IRI;
fn iterate<'a>(
&'a self,
triples: &'a Triples<R>,
) -> Box<dyn Iterator<Item = (&'a Self::Item)> + 'a> {
todo!()
}
}
pub struct ObjectIteration;
impl<R: Rdf> IterationStrategy<R> for ObjectIteration {
type Item = R::Term;
fn iterate<'a>(
&'a self,
triples: &'a Triples<R>,
) -> Box<dyn Iterator<Item = (&'a Self::Item)> + 'a> {
todo!()
}
} Note that the snippet above should be understood as pseudocode, and it may not work as-is. |
Beta Was this translation helpful? Give feedback.
-
I think that a better name for the trait above would be Graph, as it allows implementors to be Graph-like structures. Apart from that, I don't really know if the Triple struct and the SRDFBasic trait are as optimized as they should. First, we are extending the SRDFBasic trait here because we want to take advantage o the associated types that are annotated there. However, I don't really know if that makes sense. In Rust, if you extend a trait, you require implementors to also implement the other trait, which makes sense in an abstract sense of things for a Graph to also implement basic RDF-related operations. However, if you dive into the SRDFBasic trait, you will see that it encapsulates comparisons and conversions of RDF terms. In my view, it possibly makes more sense to separate those from the SRDFBasic trait, and reserve it for creating RDF nodes. An example can be found in the snippet below: // The idea behind this trait is to define the basic capabilities of any RDF-based
// data structure. From an SPARQL-endpoint to a Graph, all the implementators of
// the Rdf trait will have some basic functionality: add or remove a triple,
// add a base, add a prefix...
pub trait Rdf {
type Subject; // subject
type IRI; // predicate
type Term; // object
type BNode;
type Literal;
type Triple; // rdf-star
type Error;
pub fn add_triple(
&self,
subject: &Self::Subject,
predicate: &Self::IRI,
object: &Self::Term
) -> Result<(), Self::Error>;
pub fn remove_triple(
&self,
subject: &Self::Subject,
predicate: &Self::IRI,
object: &Self::Term
) -> Result<(), Self::Error>;
fn add_base(&mut self, base: &Self::IRI>) -> Result<(), Self::Error>;
fn add_prefix(&mut self, alias: &str, iri: &Self::IRI) -> Result<(), Self::Error>;
}
pub trait TermConversion: Rdf {
fn subject_as_iri(subject: &Self::Subject) -> Option<&Self::IRI>;
fn subject_as_bnode(subject: &Self::Subject) -> Option<&Self::BNode>;
fn term_as_iri(object: &Self::Term) -> Option<&Self::IRI>;
fn term_as_bnode(object: &Self::Term) -> Option<&Self::BNode>;
fn term_as_literal(object: &Self::Term) -> Option<&Self::Literal>;
fn term_as_triple(object: &Self::Term) -> Option<&Self::Triple>;
}
pub trait TermComparison: Rdf {
fn subject_is_iri(subject: &Self::Subject) -> bool;
fn subject_is_bnode(subject: &Self::Subject) -> bool;
fn term_is_iri(object: &Self::Term) -> bool;
fn term_is_bnode(object: &Self::Term) -> bool;
fn term_is_literal(object: &Self::Term) -> bool;
fn term_is_triple(object: &Self::Term) -> bool;
} |
Beta Was this translation helpful? Give feedback.
-
One of the remaining pieces is the SPARQL support, which can be used, for example, in the SHACL validation for implemeting the SPARQL-based engine. Thus, a handful of methods is required for this purpose. The snippet below is an example of this: pub trait Sparql: Rdf {
type QuerySolution;
fn execute(
&self,
prefixmap: Vec<Self::IRI>,
query: &str
) -> Result<Vec<QuerySolution>, Self::Error>;
pub fn select(
&self,
prefixmap: Vec<Self::IRI>,
query: &str
) -> Result<Vec<QuerySolution>, Self:Error> {
// TODO: check is a select query?
self.execute(prefixmap, query)?
}
pub fn ask(
&self,
prefixmap: Vec<Self::IRI>,
query: &str
) -> Result<bool, Self:Error> {
// TODO: check is an ask query?
todo!()
}
pub fn construct(
&self,
prefixmap: Vec<Self::IRI>,
query: &str
) -> Result<(), Self:Error> {
// TODO: check is a construct query?
todo!()
}
pub fn update(
&self,
prefixmap: Vec<Self::IRI>,
query: &str
) -> Result<(), Self:Error> {
// TODO: check is an update query?
todo!()
}
} |
Beta Was this translation helpful? Give feedback.
-
We have been discussing about the possibility of rethinking the internals of
rudof
for making it a future-proof Rust-based RDF library. The idea then would be not only to reimplement the model but to formalize the steps followed. Thus, we could design a methodology for implementing RDF in Rust.By the time now we have been focusing on delivering; that is, obtaining a usable tool. However, the scope has changed, from fast prototyping to a more stable implementation. Thus, a review of the model is required. This idea started with the technical debt that was detected when implementing the
Figure 1. Flamegraph corresponding to the SHACL validation based onrudof_lib
module (#200 and #201). As an example, in the case of the SHACL validation, during some benchmarks (#206), we could find that from the total time spent for validating a data graph against a shapes graph, the system spent 16.13% of it cloning and a total of 30.45% of the time compiling the shapes. To put it into perspective, only 44.18% of the time was spent in the validation itself. Refer to Figure 1 for more details.rudof_lib
.The components to be changed
The SRDF model
We have detected that the clones come from this part of the codebase. Not only that but, even if the Trait-based design has proved to be a good idea, both the naming conventions and the functionality provided by those traits is a bit confusing. We should stick to the Single-responsability principle. What's more, some of the methods defined depend directly on Oxigraph, losing the inherent genericity of the traits. It is also required to simplify the API of some of the methods (see the helper functions defined in the SHACL validation). Refer to Figures 2 and 3 for examples of the design proposed.
The ShEx and SHACL implementation
The idea of introducing functional parser combinators is well-suited to Rust and aligns closely with the modular nature of both ShEx and SHACL, where many components exhibit similar behavior. However, validation requires more than just syntactic analysis (parsing); it also involves creating a native representation of shapes, performing type validation, resolving imports, and more. This additional layer corresponds to semantic analysis in compiler design.
Strong external dependencies
Right now,
sparql_service
depends on Oxigraph. Not only that, butsparql_service
seems to duplicate functionality (SRDFGraph and SRDFQuery). Inshacl_validation
, I implemented the store package to simplify the low-level interface of stores, Graph and Endpoint. Additionally, I think Oxigraph introduces a very strong external dependency, and I’m wondering if it would be better to implement this at a lower level (perhaps using DuckDB).Conclusions
As we have said, we believe that
rudof
has clearly surpased its initial scope, and has proved to be useful. We have also checked that Rust is a fantastic language for building fast tools on top of RDF. Maybe it would be fine to think about the possibility of refactoring the architecture of the tool.Beta Was this translation helpful? Give feedback.
All reactions