-
Notifications
You must be signed in to change notification settings - Fork 210
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Problems with the Query optimization #1154
Comments
Has the data ie number of triples in the database increased since the upgrade to the Virtuoso 7.2.9 version in April ? Has a database integrity check been run on the database to check for any possible database corruption with the command:
Can the RDF_QUAD table indexes be checked with the following queries:
all of which should return a count of "0" if there are no issues. With the database in this state, please provide the output of running the "status();" command to return statistics on the state of the database. When you say |
Thanks for your reply,
I created this database with 7.2.9 from the ground new. We update the DB each week. This means adding new publications and updating (remove+add new) other publications.
It returned:
Yeah it was to 7.2.10.
The first query returned 0. All the others returned 1. I modified them and removed the count from the select and got this:
out of query 2-4. |
Hi The broken
Then run the previous commands to check the indexes to ensure they now all return a count of "0". Do ensure a FULL backup of your database is in place before performing these operations, which given the size of your database will take some time to complete. |
How much space do I need on for this? Does this only save this 1 broken QUAD in a new table and replace the old entries with it? |
Only the data of the missing one index will be copied to the "rq_recov" table , so if there is only one bad index very little additional space will be required. |
I repaired all 5 tables and the result of this queries is now 0. Sadly the problem with the optimization of this queries still exists. the |
OK, at least the RDF indexes are now clean. Can you provide a query(s) You indicated the database is being updated daily. What is the typical rate, i.e., how many triples per day, week, or month? I note from the If you look at the system memory in use with the Note this documentation on Virtuoso Query Optimization Diagnostics, which provides some options on how the number of plans used to find a good one can be controlled to a degree, possibly limiting memory consumption. Note that when the |
Here is the query
With
and without:
The query without
We update the DB once a week. By rule of thumb, there are 10M to 15M new triples per week. Most of the work goes into updating old triples. We update 100M to 150M triple per week (delete and add new). I can check the exact number next week.
after 1 week: The memory usage rises after some time. There was a segment fault error if I set
That is good to know. |
Based on the profile output provided, the query is taking about 66 secs to run, and only a few msecs are spent compiling the query, both with and without the What methods are being used to perform the weekly updates; i.e., what programming language(s) are being used, and what interface(s) (SQL/HTTP) are the updates being performed against? |
Yes the plan being used causes the 0,1s to 66s difference. Our tools run in Java with jdbc ( Creation of our repo: The removal of pubs and all dependencies goes over this connection in Java. |
Have you reviewed this Customizing Virtuoso SPARQL Query Optimization using Pragmas and Inline Query Options post and tried any of the query optimisation pragmas listed, to influence the query plan used, and possibly improve query performance? |
His helps to push the optimizer in the right direction. Here the profile for global:
And here in scope of
Sadly its still slower than the normal version( |
We note you have
We recommend it be set to its default of Also, when you said initially that |
Yes it was completely new created with the In my last checkpoint from today I got some warnings: |
Thanks for confirming the database was rebuilt from scratch, when upgrading to the Are you saying the |
No it started before. |
The BTW, looking at your log file snippet the Virtuoso instance was last restarted on |
This was only a snippet. You can the |
We have been using Virtuoso for a publication database for several years.
In April, I completely rebuilt it and went from a 3-year-old version to 7.2.9.
I quickly noticed that some query optimizations were causing problems. I then simply used
DEFINE select-option "order"
.Last Monday, these queries suddenly ran extremely slowly again. Nothing helped.
Server:
DB:
Config:
First, I created an explain in ISQL for this query
<http://int.database.de/linkeddata/resources/sets/8c5dd2e1-7fa6-42a0-be03-024dfc57d2d1>
has ~11000 pubs.Without
DEFINE select-option "order"
:With
DEFINE select-option "order"
:You can see here that
?pub fgi:hasTopic ?iri
is called first without the order option. This makes no sense as?pub fgi:hasTopic ?iri
has ~250 million triple and<http://int.database.de/linkeddata/resources/sets/8c5dd2e1-7fa6-42a0-be03-024dfc57d2d1> dct:hasPart ?pub.
only ~11000.Now the same with
Explain
andDEFINE select-option "order"
:Here you can see that the
hasTopic
triple took 100% of the 100 minutes. The same query only took a maximum of 4 seconds last week.Now I have updated Virtuoso to version 7.2.10.
This helped to bring the queries with the order option to the old time.
Without
DEFINE select-option "order"
, thehasTopic
triple is taken first and the query takes 60 seconds.Now I have a second query that contains a
union
. Here theorder
option is simply ignored.without Order:
with
DEFINE select-option "order"
:In both, the
hasTopic
triple is listed first and the query takes just under 3 minutes.Without
DEFINE select-option "order"
, the query also uses 100% of all cores of the server.Image
With
DEFINE select-option "order"
, it is only one core.It gets better when I make a profile without
DEFINE select-option "order"
. Virtuoso then says goodbye with a segmentation fault.In
Optional
theDEFINE select-option "order"
will also be ignored.Why is the Virtuoso Optimizer so wrong with the
hasTopic
triple and can I fix this without theDEFINE select-option "order"
?I am also surprised that before the update to version 7.2.9 last Monday the query took forever (100min) even with
DEFINE select-option "order"
.Then there is the fact that the query with the
Union
withoutDEFINE select-option "order"
uses 100% of the CPU and kills Virtuoso when the query runs viaprofile
. I think this is due to the fact that it does not run in the timeout configured inVirtuoso.ini
.I hope someone can help me with this problem.
Thanks
The text was updated successfully, but these errors were encountered: