Indexing common queries #56

T-Britton · 2017-02-21T14:58:40Z

Querying the database is very slow and I think there are a few very common queries that get used (@is_production, status_approved etc). It would be great if we could do something, like indexing, to speed up these queries.

DraTeots · 2017-02-28T22:34:59Z

Hi! Yes, this is in TODO list. But if you provide me with details (what you do with what tools) maybe I can suggest how to optimize existing code/something.

BTW. I checked how web site renders RCDB tables, and it looks like it takes several times longer than it used to. Probably I could look into it too.

T-Britton · 2017-03-06T16:02:45Z

Speaking to plotBrowser I have gotten load times to about 1->1.5 seconds. I had a request from many people to put in a button that would filter out @is_production and/or @status_approved runs (check box). Currently I am unaware of a JS API to query the DB so it must be done in python, which only runs once on page load. It isn't a difficult algorithm; Pre-query the DB and store the returned runs, then on-demand cross-reference the list against the runs that the page finds. The issue is that adding one prequery sends the load-time to almost 10 seconds. Doing a couple of pre-queries would put the page load time to almost 30 seconds. Simply indexing a few of these results would allow the return to be a lot faster and these features could be easily implemented.

sdobbs · 2017-03-07T00:24:07Z

Maybe it's worth rethinking the data structure of how the aliases are stored and adding an admin-level option to build an index out of one or more of them (seems to not be too hard in the context of SQLAlchemy)

markito3 · 2017-03-07T00:32:33Z

No admin-level options for indices. Should be part of the database schema. There is a description of the "explain" option in MySQL/MariaDB in the documentation that is relevant for coming up with good indices. Basically examine the "where" clause for slow queries.

…

On 03/06/2017 07:24 PM, Sean Dobbs wrote: Maybe it's worth rethinking the data structure of how the aliases are stored and adding an admin-level option to build an index out of one or more of them (seems to not be too hard in the context of SQLAlchemy) — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#56 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AFZvvWlJ4qdKtk9gpy5xKvJ9yxaADyYpks5rjKOngaJpZM4MHbCM>.

sdobbs · 2017-03-07T00:34:44Z

So the plan would be to not index particular aliased queries but to build indices to generally speed up the queries? This does sound like a good plan of attack to me.

…

On Mon, Mar 6, 2017 at 7:32 PM Mark M. Ito ***@***.***> wrote: No admin-level options for indices. Should be part of the database schema. There is a description of the "explain" option in MySQL/MariaDB in the documentation that is relevant for coming up with good indices. Basically examine the "where" clause for slow queries. On 03/06/2017 07:24 PM, Sean Dobbs wrote: > > Maybe it's worth rethinking the data structure of how the aliases are > stored and adding an admin-level option to build an index out of one > or more of them (seems to not be too hard in the context of SQLAlchemy) > > — > You are receiving this because you are subscribed to this thread. > Reply to this email directly, view it on GitHub > <#56 (comment)>, > or mute the thread > < https://github.com/notifications/unsubscribe-auth/AFZvvWlJ4qdKtk9gpy5xKvJ9yxaADyYpks5rjKOngaJpZM4MHbCM >. > — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#56 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABIJaomn-OTl4GRQ7rVBZ8ZkWZHjCcQ2ks5rjKWigaJpZM4MHbCM> .

DraTeots · 2017-03-14T17:35:03Z

As Sean said it's worth rethinking the data structure of how the aliases are stored. Moreover, frankly saying, MySQL (MariaDB, whatever) doesn't fit good for this part of the RCDB.

Historical context. As one could remember when RCDB was under initial design, it was called the Trigger database and it should have contained tables representing boards configurations, installations - etc. That type of data fits perfectly in a relational DB schema with rows, columns, etc. But as it comes out, this part of RCDB is almost not in use. While run-name-value part de facto is the main part now. And I believe, among RCDB users almost nobody knows, that besides run-name-value db there are also a lot of tables about trigger, boards, crates, etc.

I believe that both run-name-value part and run-config_file_name-content part would fit perfectly in one of NoSQL document based databases. There are zillion NoSQL databases that would fit.

Still i don't think we have to drop MySQL and move RCDB to another database. Because I think I see how to make RCDB fast again =)

Use Views for existing conditions to search the data as it is one table
"rethinking the data structure of how the aliases are stored" (I really liked that phrase). The idea is to:
- make it easier to create aliases (through admin web interface for example)
- aliases should be calculated in background and the calculated values to be used for search (now aliases are recalculated each time).
- indexing on fly for new records
- have some govern scripts for reindexing of existing data
Queries should be parsed to AST tree and converted to SQL WHERE clause so 1 and 2 could be used at the full speed

I hope that doing so could let us have 0.1s as an order of magnitude of the worst RCDB query. And hope I could optimize it much more than this value.

sdobbs · 2017-03-16T16:34:01Z

OK, that sounds like a really good plan for a solution.
Maybe it is worth thinking more about the cost/benefit of moving this to a NoSQL DB, though?

DraTeots · 2017-03-20T19:16:22Z

After some experiments with views, temporary tables, indexes and compound queries I found pretty simple way to convert RCDB requests to raw SQL which gives good results on hallddb for MySQL:

Profile shows that complex queries like @is_production with 5-10 returning conditions/columns take:

Overall 0.18 [s], where
0.17 [s] is data sending. Which means that
Actual query takes only ~0.01[s]

Which looks very promising. Hopefully network communication between halldweb and hallddb is faster than with my laptop over wifi.

At the same time, the same queries work REALLY slow on SQlite. At least from command line. So I have investigate it first.

Adding support for NoSQL databases for RCDB is my backup plan. At least now it looks like we may just stay on MySQL without problems

sdobbs · 2017-03-20T19:18:12Z

This is sounding great!

…

On Mon, Mar 20, 2017 at 2:16 PM Dmitry Romanov ***@***.***> wrote: After some experiments with views, temporary tables, indexes and compound queries I found pretty simple way to convert RCDB requests to raw SQL which gives good results on hallddb for MySQL: Profile shows that complex queries like @is_production with 5-10 returning conditions/columns take: - Overall *0.18* [s], where - *0.17* [s] is data sending. Which means that - Actual query takes only ~*0.01*[s] Which looks very promising. Hopefully network communication between halldweb and hallddb is faster than with my laptop over wifi. At the same time, the same queries work REALLY slow on SQlite. At least from command line. So I have investigate it first. Adding support for NoSQL databases for RCDB is my backup plan. At least now it looks like we may just stay on MySQL without problems — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#56 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABIJaqQduFiSW2RzwGQC6MF99QARVrkFks5rntCGgaJpZM4MHbCM> .

T-Britton · 2017-03-20T19:26:13Z

I like the sounds of those times. 2 seconds on the longer end is great and some gymnastics will go unnoticed by users. Thomas Britton

…

On Mar 20, 2017, at 3:16 PM, Dmitry Romanov ***@***.***> wrote: After some experiments with views, temporary tables, indexes and compound queries I found pretty simple way to convert RCDB requests to raw SQL which gives good results on hallddb for MySQL: Profile shows that complex queries like @is_production with 5-10 returning conditions/columns take: Overall 0.18 [s], where 0.17 [s] is data sending. Which means that Actual query takes only ~0.01[s] Which looks very promising. Hopefully network communication between halldweb and hallddb is faster than with my laptop over wifi. At the same time, the same queries work REALLY slow on SQlite. At least from command line. So I have investigate it first. Adding support for NoSQL databases for RCDB is my backup plan. At least now it looks like we may just stay on MySQL without problems — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Indexing common queries #56

Indexing common queries #56

T-Britton commented Feb 21, 2017

DraTeots commented Feb 28, 2017

T-Britton commented Mar 6, 2017

sdobbs commented Mar 7, 2017

markito3 commented Mar 7, 2017 via email

sdobbs commented Mar 7, 2017 via email

DraTeots commented Mar 14, 2017 •

edited

Loading

sdobbs commented Mar 16, 2017

DraTeots commented Mar 20, 2017

sdobbs commented Mar 20, 2017 via email

T-Britton commented Mar 20, 2017 via email

Indexing common queries #56

Indexing common queries #56

Comments

T-Britton commented Feb 21, 2017

DraTeots commented Feb 28, 2017

T-Britton commented Mar 6, 2017

sdobbs commented Mar 7, 2017

markito3 commented Mar 7, 2017 via email

sdobbs commented Mar 7, 2017 via email

DraTeots commented Mar 14, 2017 • edited Loading

sdobbs commented Mar 16, 2017

DraTeots commented Mar 20, 2017

sdobbs commented Mar 20, 2017 via email

T-Britton commented Mar 20, 2017 via email

DraTeots commented Mar 14, 2017 •

edited

Loading