-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Indexing common queries #56
Comments
Hi! Yes, this is in TODO list. But if you provide me with details (what you do with what tools) maybe I can suggest how to optimize existing code/something. BTW. I checked how web site renders RCDB tables, and it looks like it takes several times longer than it used to. Probably I could look into it too. |
Speaking to plotBrowser I have gotten load times to about 1->1.5 seconds. I had a request from many people to put in a button that would filter out @is_production and/or @status_approved runs (check box). Currently I am unaware of a JS API to query the DB so it must be done in python, which only runs once on page load. It isn't a difficult algorithm; Pre-query the DB and store the returned runs, then on-demand cross-reference the list against the runs that the page finds. The issue is that adding one prequery sends the load-time to almost 10 seconds. Doing a couple of pre-queries would put the page load time to almost 30 seconds. Simply indexing a few of these results would allow the return to be a lot faster and these features could be easily implemented. |
Maybe it's worth rethinking the data structure of how the aliases are stored and adding an admin-level option to build an index out of one or more of them (seems to not be too hard in the context of SQLAlchemy) |
No admin-level options for indices. Should be part of the database schema.
There is a description of the "explain" option in MySQL/MariaDB in the
documentation that is relevant for coming up with good indices.
Basically examine the "where" clause for slow queries.
…On 03/06/2017 07:24 PM, Sean Dobbs wrote:
Maybe it's worth rethinking the data structure of how the aliases are
stored and adding an admin-level option to build an index out of one
or more of them (seems to not be too hard in the context of SQLAlchemy)
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#56 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AFZvvWlJ4qdKtk9gpy5xKvJ9yxaADyYpks5rjKOngaJpZM4MHbCM>.
|
So the plan would be to not index particular aliased queries but to build
indices to generally speed up the queries?
This does sound like a good plan of attack to me.
…On Mon, Mar 6, 2017 at 7:32 PM Mark M. Ito ***@***.***> wrote:
No admin-level options for indices. Should be part of the database schema.
There is a description of the "explain" option in MySQL/MariaDB in the
documentation that is relevant for coming up with good indices.
Basically examine the "where" clause for slow queries.
On 03/06/2017 07:24 PM, Sean Dobbs wrote:
>
> Maybe it's worth rethinking the data structure of how the aliases are
> stored and adding an admin-level option to build an index out of one
> or more of them (seems to not be too hard in the context of SQLAlchemy)
>
> —
> You are receiving this because you are subscribed to this thread.
> Reply to this email directly, view it on GitHub
> <#56 (comment)>,
> or mute the thread
> <
https://github.com/notifications/unsubscribe-auth/AFZvvWlJ4qdKtk9gpy5xKvJ9yxaADyYpks5rjKOngaJpZM4MHbCM
>.
>
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#56 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABIJaomn-OTl4GRQ7rVBZ8ZkWZHjCcQ2ks5rjKWigaJpZM4MHbCM>
.
|
As Sean said it's worth rethinking the data structure of how the aliases are stored. Moreover, frankly saying, MySQL (MariaDB, whatever) doesn't fit good for this part of the RCDB.
I believe that both run-name-value part and run-config_file_name-content part would fit perfectly in one of NoSQL document based databases. There are zillion NoSQL databases that would fit. Still i don't think we have to drop MySQL and move RCDB to another database. Because I think I see how to make RCDB fast again =)
I hope that doing so could let us have 0.1s as an order of magnitude of the worst RCDB query. And hope I could optimize it much more than this value. |
OK, that sounds like a really good plan for a solution. |
After some experiments with views, temporary tables, indexes and compound queries I found pretty simple way to convert RCDB requests to raw SQL which gives good results on hallddb for MySQL: Profile shows that complex queries like
Which looks very promising. Hopefully network communication between halldweb and hallddb is faster than with my laptop over wifi. At the same time, the same queries work REALLY slow on SQlite. At least from command line. So I have investigate it first. Adding support for NoSQL databases for RCDB is my backup plan. At least now it looks like we may just stay on MySQL without problems |
This is sounding great!
…On Mon, Mar 20, 2017 at 2:16 PM Dmitry Romanov ***@***.***> wrote:
After some experiments with views, temporary tables, indexes and compound
queries I found pretty simple way to convert RCDB requests to raw SQL which
gives good results on hallddb for MySQL:
Profile shows that complex queries like @is_production with 5-10
returning conditions/columns take:
- Overall *0.18* [s], where
- *0.17* [s] is data sending. Which means that
- Actual query takes only ~*0.01*[s]
Which looks very promising. Hopefully network communication between
halldweb and hallddb is faster than with my laptop over wifi.
At the same time, the same queries work REALLY slow on SQlite. At least
from command line. So I have investigate it first.
Adding support for NoSQL databases for RCDB is my backup plan. At least
now it looks like we may just stay on MySQL without problems
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#56 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABIJaqQduFiSW2RzwGQC6MF99QARVrkFks5rntCGgaJpZM4MHbCM>
.
|
I like the sounds of those times. 2 seconds on the longer end is great and some gymnastics will go unnoticed by users.
Thomas Britton
… On Mar 20, 2017, at 3:16 PM, Dmitry Romanov ***@***.***> wrote:
After some experiments with views, temporary tables, indexes and compound queries I found pretty simple way to convert RCDB requests to raw SQL which gives good results on hallddb for MySQL:
Profile shows that complex queries like @is_production with 5-10 returning conditions/columns take:
Overall 0.18 [s], where
0.17 [s] is data sending. Which means that
Actual query takes only ~0.01[s]
Which looks very promising. Hopefully network communication between halldweb and hallddb is faster than with my laptop over wifi.
At the same time, the same queries work REALLY slow on SQlite. At least from command line. So I have investigate it first.
Adding support for NoSQL databases for RCDB is my backup plan. At least now it looks like we may just stay on MySQL without problems
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub, or mute the thread.
|
Querying the database is very slow and I think there are a few very common queries that get used (@is_production, status_approved etc). It would be great if we could do something, like indexing, to speed up these queries.
The text was updated successfully, but these errors were encountered: