Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[24.0] Fix very slow user data table query #17830

Merged
merged 1 commit into from
Mar 25, 2024

Conversation

mvdbeek
Copy link
Member

@mvdbeek mvdbeek commented Mar 25, 2024

by adding an index on extension.
Fixes extremely slow tool form building.

Before:

['Sort  (cost=11152.93..11152.94 rows=1 width=575) (actual time=151.608..152.636 rows=0 loops=1)',
 '  Sort Key: history_dataset_association.id',
 '  Sort Method: quicksort  Memory: 25kB',
 '  ->  Nested Loop  (cost=1001.26..11152.92 rows=1 width=575) (actual time=151.564..152.592 rows=0 loops=1)',
 '        ->  Nested Loop  (cost=1000.84..11144.46 rows=1 width=575) (actual time=151.564..152.592 rows=0 loops=1)',
 '              Join Filter: (history_dataset_association.dataset_id = dataset.id)',
 '              ->  Nested Loop  (cost=1000.42..11143.94 rows=1 width=575) (actual time=151.563..152.591 rows=0 loops=1)',
 '                    ->  Gather  (cost=1000.00..11135.50 rows=1 width=363) (actual time=151.563..152.590 rows=0 loops=1)',
 '                          Workers Planned: 2',
 '                          Workers Launched: 2',
 '                          ->  Parallel Seq Scan on history_dataset_association  (cost=0.00..10135.40 rows=1 width=363) (actual time=139.910..139.911 rows=0 loops=3)',
 "                                Filter: ((NOT deleted) AND (metadata ~~ '\\x2522616c6c5f66617374612225'::bytea) AND ((extension)::text = 'data_manager_json'::text))",
 '                                Rows Removed by Filter: 80928',
 '                    ->  Index Scan using dataset_pkey on dataset dataset_1  (cost=0.42..8.44 rows=1 width=212) (never executed)',
 '                          Index Cond: (id = history_dataset_association.dataset_id)',
 '              ->  Index Scan using dataset_pkey on dataset  (cost=0.42..0.51 rows=1 width=4) (never executed)',
 '                    Index Cond: (id = dataset_1.id)',
 "                    Filter: ((total_size <> file_size) AND ((state)::text = 'ok'::text))",
 '        ->  Index Scan using history_pkey on history  (cost=0.42..8.44 rows=1 width=4) (never executed)',
 '              Index Cond: (id = history_dataset_association.history_id)',
 '              Filter: (user_id = 1)',
 'Planning Time: 9.134 ms',
 'Execution Time: 152.984 ms']

After:

['Sort  (cost=352.72..352.73 rows=1 width=575) (actual time=3.429..3.432 rows=0 loops=1)',
 '  Sort Key: history_dataset_association.id',
 '  Sort Method: quicksort  Memory: 25kB',
 '  ->  Nested Loop  (cost=6.35..352.71 rows=1 width=575) (actual time=3.377..3.379 rows=0 loops=1)',
 '        ->  Nested Loop  (cost=5.93..344.25 rows=1 width=575) (actual time=3.377..3.378 rows=0 loops=1)',
 '              Join Filter: (history_dataset_association.dataset_id = dataset.id)',
 '              ->  Nested Loop  (cost=5.51..343.73 rows=1 width=575) (actual time=3.376..3.377 rows=0 loops=1)',
 '                    ->  Bitmap Heap Scan on history_dataset_association  (cost=5.09..335.29 rows=1 width=363) (actual time=3.376..3.376 rows=0 loops=1)',
 "                          Recheck Cond: ((extension)::text = 'data_manager_json'::text)",
 "                          Filter: ((NOT deleted) AND (metadata ~~ '\\x2522616c6c5f66617374612225'::bytea))",
 '                          Rows Removed by Filter: 126',
 '                          Heap Blocks: exact=55',
 '                          ->  Bitmap Index Scan on hda_ext  (cost=0.00..5.09 rows=89 width=0) (actual time=0.616..0.617 rows=126 loops=1)',
 "                                Index Cond: ((extension)::text = 'data_manager_json'::text)",
 '                    ->  Index Scan using dataset_pkey on dataset dataset_1  (cost=0.42..8.44 rows=1 width=212) (never executed)',
 '                          Index Cond: (id = history_dataset_association.dataset_id)',
 '              ->  Index Scan using dataset_pkey on dataset  (cost=0.42..0.51 rows=1 width=4) (never executed)',
 '                    Index Cond: (id = dataset_1.id)',
 "                    Filter: ((total_size <> file_size) AND ((state)::text = 'ok'::text))",
 '        ->  Index Scan using history_pkey on history  (cost=0.42..8.44 rows=1 width=4) (never executed)',
 '              Index Cond: (id = history_dataset_association.history_id)',
 '              Filter: (user_id = 1)',
 'Planning Time: 23.736 ms',
 'Execution Time: 3.594 ms']

This became important with #17435

How to test the changes?

(Select all options that apply)

  • I've included appropriate automated tests.
  • This is a refactoring of components with existing test coverage.
  • Instructions for manual testing are as follows:
    1. [add testing steps and prerequisites here if you didn't write automated tests covering all your changes]

License

  • I agree to license these and all my past contributions to the core galaxy codebase under the MIT license.

@mvdbeek mvdbeek added kind/bug area/performance area/database Galaxy's database or data access layer labels Mar 25, 2024
@github-actions github-actions bot added this to the 24.1 milestone Mar 25, 2024
by adding an index on extension.
Fixes extremely slow tool form building.
@mvdbeek mvdbeek force-pushed the add_extension_index branch from 12aacf5 to e74d078 Compare March 25, 2024 10:25
@bgruening bgruening modified the milestones: 24.1, 24.0 Mar 25, 2024
@nsoranzo nsoranzo requested a review from jdavcs March 25, 2024 12:36
@jdavcs jdavcs merged commit ec1dc96 into galaxyproject:release_24.0 Mar 25, 2024
49 checks passed
@wm75
Copy link
Contributor

wm75 commented Mar 25, 2024

not directly of concern here, but:
this tool https://usegalaxy.eu/root?tool_id=toolshed.g2.bx.psu.edu/repos/wolma/mimodd_main/mimodd_map/0.1.9 was particularly strongly affected by the slow loading and what's special about it is that it has a datatable select in multiple branches of a conditional. From this it seems the query is executed multiple times for the same data table in a wrapper?
Avoiding this might be another possible optimization.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/database Galaxy's database or data access layer area/performance kind/bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants