Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Query objects are not supported as input to the query method #26

Open
lukasmu opened this issue Apr 21, 2023 · 5 comments
Open

Query objects are not supported as input to the query method #26

lukasmu opened this issue Apr 21, 2023 · 5 comments

Comments

@lukasmu
Copy link

lukasmu commented Apr 21, 2023

The query method of the OxigraphStore does not accept rdflib Query objects but only strings and otherwise fails with a TypeError.

Example:

import rdflib
from rdflib.plugins.sparql import prepareQuery
from rdflib.namespace import DCTERMS

graph = rdflib.Graph(store='Oxigraph')
graph.add((rdflib.URIRef('http://localhost/example'), DCTERMS.title, rdflib.Literal('An arbitrary example')))

query_string = f'SELECT ?title WHERE {{?obj <{DCTERMS.title}> ?title}}'
query_object = prepareQuery(query_string)

# This works:
result = graph.query(query_string)
print(list(result)[0][0])

# This does not work:
result = graph.query(query_object)
print(list(result)[0][0])

Maybe the line

if isinstance(queryGraph, Query) or kwargs:
should rather read if isinstance(query, Query) or kwargs for the time being?

In the long term, it would be great if prepared Query objects could be supported. Maybe this could result in even further speed ups? Probably this is related to #11 as well.

Thanks a lot for your efforts!

@Tpt
Copy link
Contributor

Tpt commented Apr 22, 2023

Thank you for this nice bug report! I have fixed the condition in 85ce7d4 and released v0.3.4 with the fix.

In the long term, it would be great if prepared Query objects could be supported. Maybe this could result in even further speed ups?

Yes, indeed. Not sure about the speed up because there is the conversion cost between Python and Rust and I am not sure if it's much faster than SPARQL parsing. I see two ways of implementing this:

  1. write code to convert the rdflib SPARQ algebra objects to Rust spargebra objects.
  2. Serialize the rdflib SPARQL algebra object back to a SPARQL string and give this string to Oxigraph.

Approach 2 might be slightly slower but might be easier to implement and having a serializer for SPARQL algebra objects might be a relevant feature for rdflib independently of Oxigraph.

@lukasmu
Copy link
Author

lukasmu commented Apr 23, 2023

Some food for thought:

Rdflib is often perceived as rather slow. I think that the main reasons for this are either:
a) Performance limitations of the underlying store
b) Performance limitations of the SPARQL query parsing

For the project I am working on I first switched to the Oxigraph store because of a hint on Stack Overflow (https://stackoverflow.com/a/71008345). I was really surprised by how much faster my program ran.
Only later I found out that I can achieve an even better runtime by using the default rdflib memory store and preparing & caching rdflib Query objects.

I can imagine that quite some people are using Query objects instead of SPARQL strings just for performance reasons.

While a serialization from rdflib Query objects to SPARQL strings might not be very costly computationally, the generation of the rdflib Query objects (e.g. using the rdflib.plugins.sparql.prepareQuery function) in the first place is computationally expensive.

So maybe instead of going with approach 2 it might even be more sensible to not change anything and rather educate people to use only SPARQL strings and to not even prepare Query objects when using the Oxigraph store.

Side note: How efficient is the creation of Rust spargebra objects from SPARQL strings? Maybe it makes sense to cache/memoize this somehow in Oxigraph?

@pchampin
Copy link

pchampin commented Dec 12, 2023

I ran into a similar problem ages ago, and I submitted a PR with a "temporary" patch, which apparently is still there 😆 :

a Query object carries with it the arguments used to build it, i.e. the query text, the namespaces, and the base IRI.

I monkeypatched oxrdflib to take advantage of this, and it seems to work fine. It boils down to the following preprocessing lines:

if isinstance(query, rdflib.plugins.sparql.sparql.Query) and hasattr(query, "_original_args"):
    query, initInitNs, base = query._original_args
    initInitNs.update(initNs)
    initNs = initInitNs
    if base:
        kwargs.setdefault('base', base)
base = kwargs.pop('base', None)
if base != None:
    query = f'BASE <{base}>\n{query}'

@Tpt
Copy link
Contributor

Tpt commented Dec 12, 2023

@pchampin Nice! Thank you!

I am a bit reluctant to use it in oxrdflib. I guess that if the Query object gets mutated after parsing but before being given to the Store.query method, the mutations will get ignored with this code. This looks like a significant footgun to me. Am I correct?

@pchampin
Copy link

I guess that if the Query object gets mutated after parsing but before being given to the Store.query method, the mutations will get ignored with this code.

I would never have considered mutating an instance of Query, but it is true that the attributes are apparently "public" (per the naming convention), so in principle "open for mutation"... but I would consider that (mutating the attribute of a Query instance) as a footgun...

Would you be more comfortable to apply the strategy above if all attributes of Query were "protected" in RDFlib? If so, it might be worth asking what the intention of the authors are, and proposing a patch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants