Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Indexing common queries #56

Open
T-Britton opened this issue Feb 21, 2017 · 10 comments
Open

Indexing common queries #56

T-Britton opened this issue Feb 21, 2017 · 10 comments

Comments

@T-Britton
Copy link
Collaborator

Querying the database is very slow and I think there are a few very common queries that get used (@is_production, status_approved etc). It would be great if we could do something, like indexing, to speed up these queries.

@DraTeots
Copy link
Collaborator

Hi! Yes, this is in TODO list. But if you provide me with details (what you do with what tools) maybe I can suggest how to optimize existing code/something.

BTW. I checked how web site renders RCDB tables, and it looks like it takes several times longer than it used to. Probably I could look into it too.

@T-Britton
Copy link
Collaborator Author

Speaking to plotBrowser I have gotten load times to about 1->1.5 seconds. I had a request from many people to put in a button that would filter out @is_production and/or @status_approved runs (check box). Currently I am unaware of a JS API to query the DB so it must be done in python, which only runs once on page load. It isn't a difficult algorithm; Pre-query the DB and store the returned runs, then on-demand cross-reference the list against the runs that the page finds. The issue is that adding one prequery sends the load-time to almost 10 seconds. Doing a couple of pre-queries would put the page load time to almost 30 seconds. Simply indexing a few of these results would allow the return to be a lot faster and these features could be easily implemented.

@sdobbs
Copy link
Collaborator

sdobbs commented Mar 7, 2017

Maybe it's worth rethinking the data structure of how the aliases are stored and adding an admin-level option to build an index out of one or more of them (seems to not be too hard in the context of SQLAlchemy)

@markito3
Copy link
Member

markito3 commented Mar 7, 2017 via email

@sdobbs
Copy link
Collaborator

sdobbs commented Mar 7, 2017 via email

@DraTeots
Copy link
Collaborator

DraTeots commented Mar 14, 2017

As Sean said it's worth rethinking the data structure of how the aliases are stored. Moreover, frankly saying, MySQL (MariaDB, whatever) doesn't fit good for this part of the RCDB.

Historical context. As one could remember when RCDB was under initial design, it was called the Trigger database and it should have contained tables representing boards configurations, installations - etc. That type of data fits perfectly in a relational DB schema with rows, columns, etc. But as it comes out, this part of RCDB is almost not in use. While run-name-value part de facto is the main part now. And I believe, among RCDB users almost nobody knows, that besides run-name-value db there are also a lot of tables about trigger, boards, crates, etc.

I believe that both run-name-value part and run-config_file_name-content part would fit perfectly in one of NoSQL document based databases. There are zillion NoSQL databases that would fit.

Still i don't think we have to drop MySQL and move RCDB to another database. Because I think I see how to make RCDB fast again =)

  1. Use Views for existing conditions to search the data as it is one table
  2. "rethinking the data structure of how the aliases are stored" (I really liked that phrase). The idea is to:
    • make it easier to create aliases (through admin web interface for example)
    • aliases should be calculated in background and the calculated values to be used for search (now aliases are recalculated each time).
    • indexing on fly for new records
    • have some govern scripts for reindexing of existing data
  3. Queries should be parsed to AST tree and converted to SQL WHERE clause so 1 and 2 could be used at the full speed

I hope that doing so could let us have 0.1s as an order of magnitude of the worst RCDB query. And hope I could optimize it much more than this value.

@sdobbs
Copy link
Collaborator

sdobbs commented Mar 16, 2017

OK, that sounds like a really good plan for a solution.
Maybe it is worth thinking more about the cost/benefit of moving this to a NoSQL DB, though?

@DraTeots
Copy link
Collaborator

After some experiments with views, temporary tables, indexes and compound queries I found pretty simple way to convert RCDB requests to raw SQL which gives good results on hallddb for MySQL:

Profile shows that complex queries like @is_production with 5-10 returning conditions/columns take:

  • Overall 0.18 [s], where
  • 0.17 [s] is data sending. Which means that
  • Actual query takes only ~0.01[s]

Which looks very promising. Hopefully network communication between halldweb and hallddb is faster than with my laptop over wifi.

At the same time, the same queries work REALLY slow on SQlite. At least from command line. So I have investigate it first.

Adding support for NoSQL databases for RCDB is my backup plan. At least now it looks like we may just stay on MySQL without problems

@sdobbs
Copy link
Collaborator

sdobbs commented Mar 20, 2017 via email

@T-Britton
Copy link
Collaborator Author

T-Britton commented Mar 20, 2017 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants