Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve subquery planning #4651

Merged
merged 2 commits into from
Dec 22, 2024
Merged

Improve subquery planning #4651

merged 2 commits into from
Dec 22, 2024

Conversation

andyfengHKU
Copy link
Contributor

@andyfengHKU andyfengHKU commented Dec 18, 2024

Description

This PR improves subquery planning. We used to only unnest internal ID based subquery, e.g. MATCH (a) WHERE EXISTS { MATCH (a)->(b) }.

This PR generalizes this to any equality joins, e.g. MATCH (a) WHERE EXISTS { MATCH (b) WHERE a.age = b.age }

Sanity check benchmark

MATCH (a:Comment) WHERE EXISTS { MATCH (b:Comment) WHERE a.id=b.id} RETURN COUNT(*);

For LDBC100, the above query improves from 50s to 12s.

Fixes # (issue)

Contributor agreement

@andyfengHKU andyfengHKU force-pushed the improve-subquery-planning branch from 95ec3a2 to be6b15a Compare December 20, 2024 02:59
@andyfengHKU andyfengHKU requested a review from ray6080 December 20, 2024 03:10
Copy link

Benchmark Result

Master commit hash: bfe46c071c48fcf8fcdaa911238929bf617c53d1
Branch commit hash: 7d9cfa9316482f4bca5292b5f16033a00a9b815d

Query Group Query Name Mean Time - Commit (ms) Mean Time - Master (ms) Diff
aggregation q24 650.66 642.92 7.74 (1.20%)
aggregation q28 11961.94 11143.70 818.24 (7.34%)
filter q14 137.04 125.51 11.53 (9.19%)
filter q15 143.36 131.77 11.59 (8.79%)
filter q16 317.54 300.58 16.96 (5.64%)
filter q17 453.44 444.57 8.87 (2.00%)
filter q18 2016.38 1936.40 79.97 (4.13%)
filter zonemap-node 97.14 N/A N/A
filter zonemap-node-lhs-cast 94.82 N/A N/A
filter zonemap-rel 5676.99 N/A N/A
fixed_size_expr_evaluator q07 590.91 547.12 43.79 (8.00%)
fixed_size_expr_evaluator q08 806.73 761.17 45.56 (5.99%)
fixed_size_expr_evaluator q09 807.75 761.27 46.48 (6.10%)
fixed_size_expr_evaluator q10 244.22 241.04 3.18 (1.32%)
fixed_size_expr_evaluator q11 237.08 235.99 1.09 (0.46%)
fixed_size_expr_evaluator q12 238.68 234.97 3.71 (1.58%)
fixed_size_expr_evaluator q13 1474.71 1468.12 6.59 (0.45%)
fixed_size_seq_scan q23 121.54 121.73 -0.18 (-0.15%)
join q29 611.40 652.48 -41.08 (-6.30%)
join q30 1517.91 1504.33 13.58 (0.90%)
join q31 4.95 8.07 -3.12 (-38.65%)
join SelectiveTwoHopJoin 47.98 N/A N/A
ldbc_snb_ic q35 2631.10 388.13 2242.97 (577.90%)
ldbc_snb_ic q36 538.18 35.88 502.29 (1399.75%)
ldbc_snb_is q32 7.00 7.02 -0.01 (-0.20%)
ldbc_snb_is q33 10.64 16.35 -5.71 (-34.93%)
ldbc_snb_is q34 1.16 4.13 -2.96 (-71.79%)
multi-rel multi-rel-large-scan 1363.52 1737.67 -374.15 (-21.53%)
multi-rel multi-rel-lookup 9.71 61.18 -51.47 (-84.13%)
multi-rel multi-rel-small-scan 87.14 67.45 19.68 (29.18%)
order_by q25 140.33 135.63 4.70 (3.47%)
order_by q26 454.04 459.01 -4.97 (-1.08%)
order_by q27 1467.11 1467.03 0.08 (0.01%)
recursive_join recursive-join-bidirection 297.76 N/A N/A
recursive_join recursive-join-dense 7414.21 N/A N/A
recursive_join recursive-join-path 23967.22 N/A N/A
recursive_join recursive-join-sparse 14663.59 N/A N/A
recursive_join recursive-join-trail 7340.34 N/A N/A
scan_after_filter q01 179.41 170.69 8.73 (5.11%)
scan_after_filter q02 164.03 159.40 4.62 (2.90%)
shortest_path_ldbc100 q37 81.69 3336.32 -3254.62 (-97.55%)
shortest_path_ldbc100 q38 334.52 68.44 266.08 (388.80%)
shortest_path_ldbc100 q39 62.42 85.71 -23.30 (-27.18%)
shortest_path_ldbc100 q40 415.71 74.58 341.13 (457.38%)
var_size_expr_evaluator q03 2137.27 2057.00 80.27 (3.90%)
var_size_expr_evaluator q04 2325.51 2241.83 83.68 (3.73%)
var_size_expr_evaluator q05 2638.43 2625.31 13.12 (0.50%)
var_size_expr_evaluator q06 1391.13 1346.92 44.21 (3.28%)
var_size_seq_scan q19 1499.55 1468.72 30.82 (2.10%)
var_size_seq_scan q20 2716.03 2766.27 -50.24 (-1.82%)
var_size_seq_scan q21 2331.05 2263.77 67.28 (2.97%)
var_size_seq_scan q22 130.99 128.98 2.01 (1.56%)

Copy link

codecov bot commented Dec 20, 2024

Codecov Report

Attention: Patch coverage is 93.39623% with 7 lines in your changes missing coverage. Please review.

Project coverage is 86.50%. Comparing base (86f1f43) to head (3530c01).
Report is 1 commits behind head on master.

Files with missing lines Patch % Lines
src/planner/plan/plan_subquery.cpp 95.06% 4 Missing ⚠️
src/planner/plan/append_join.cpp 80.00% 2 Missing ⚠️
src/planner/join_order/cost_model.cpp 90.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #4651      +/-   ##
==========================================
- Coverage   86.51%   86.50%   -0.01%     
==========================================
  Files        1369     1369              
  Lines       57951    58018      +67     
  Branches     7206     7216      +10     
==========================================
+ Hits        50135    50189      +54     
- Misses       7648     7661      +13     
  Partials      168      168              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link

Benchmark Result

Master commit hash: bfe46c071c48fcf8fcdaa911238929bf617c53d1
Branch commit hash: a5cb74ed2a562f040eac67934ac4bd4ae501a741

Query Group Query Name Mean Time - Commit (ms) Mean Time - Master (ms) Diff
aggregation q24 663.38 642.92 20.46 (3.18%)
aggregation q28 11053.96 11143.70 -89.74 (-0.81%)
filter q14 137.77 125.51 12.27 (9.77%)
filter q15 133.70 131.77 1.92 (1.46%)
filter q16 312.07 300.58 11.49 (3.82%)
filter q17 460.45 444.57 15.88 (3.57%)
filter q18 2038.12 1936.40 101.72 (5.25%)
filter zonemap-node 96.12 N/A N/A
filter zonemap-node-lhs-cast 94.66 N/A N/A
filter zonemap-rel 5737.01 N/A N/A
fixed_size_expr_evaluator q07 579.15 547.12 32.03 (5.85%)
fixed_size_expr_evaluator q08 827.07 761.17 65.91 (8.66%)
fixed_size_expr_evaluator q09 822.71 761.27 61.44 (8.07%)
fixed_size_expr_evaluator q10 249.28 241.04 8.24 (3.42%)
fixed_size_expr_evaluator q11 241.74 235.99 5.75 (2.44%)
fixed_size_expr_evaluator q12 233.84 234.97 -1.13 (-0.48%)
fixed_size_expr_evaluator q13 1496.92 1468.12 28.80 (1.96%)
fixed_size_seq_scan q23 140.65 121.73 18.92 (15.55%)
join q29 598.01 652.48 -54.47 (-8.35%)
join q30 1573.90 1504.33 69.58 (4.63%)
join q31 5.84 8.07 -2.23 (-27.68%)
join SelectiveTwoHopJoin 52.18 N/A N/A
ldbc_snb_ic q35 2665.52 388.13 2277.39 (586.77%)
ldbc_snb_ic q36 556.89 35.88 521.01 (1451.89%)
ldbc_snb_is q32 4.92 7.02 -2.10 (-29.88%)
ldbc_snb_is q33 9.47 16.35 -6.87 (-42.05%)
ldbc_snb_is q34 1.18 4.13 -2.95 (-71.39%)
multi-rel multi-rel-large-scan 1177.86 1737.67 -559.82 (-32.22%)
multi-rel multi-rel-lookup 16.45 61.18 -44.73 (-73.11%)
multi-rel multi-rel-small-scan 52.37 67.45 -15.08 (-22.36%)
order_by q25 140.02 135.63 4.39 (3.24%)
order_by q26 461.02 459.01 2.01 (0.44%)
order_by q27 1486.25 1467.03 19.22 (1.31%)
recursive_join recursive-join-bidirection 275.04 N/A N/A
recursive_join recursive-join-dense 5192.61 N/A N/A
recursive_join recursive-join-path 23348.78 N/A N/A
recursive_join recursive-join-sparse 14257.24 N/A N/A
recursive_join recursive-join-trail 5586.38 N/A N/A
scan_after_filter q01 181.23 170.69 10.55 (6.18%)
scan_after_filter q02 166.41 159.40 7.01 (4.40%)
shortest_path_ldbc100 q37 86.11 3336.32 -3250.21 (-97.42%)
shortest_path_ldbc100 q38 283.09 68.44 214.65 (313.65%)
shortest_path_ldbc100 q39 63.61 85.71 -22.10 (-25.79%)
shortest_path_ldbc100 q40 448.78 74.58 374.20 (501.72%)
var_size_expr_evaluator q03 2145.21 2057.00 88.21 (4.29%)
var_size_expr_evaluator q04 2337.92 2241.83 96.09 (4.29%)
var_size_expr_evaluator q05 2677.28 2625.31 51.97 (1.98%)
var_size_expr_evaluator q06 1344.67 1346.92 -2.26 (-0.17%)
var_size_seq_scan q19 1494.86 1468.72 26.14 (1.78%)
var_size_seq_scan q20 2701.09 2766.27 -65.18 (-2.36%)
var_size_seq_scan q21 2342.67 2263.77 78.91 (3.49%)
var_size_seq_scan q22 129.62 128.98 0.65 (0.50%)

@andyfengHKU andyfengHKU force-pushed the improve-subquery-planning branch from 84d4edb to 3530c01 Compare December 20, 2024 06:14
Copy link

Benchmark Result

Master commit hash: 86f1f4310edbd877f8db75800c6d1ac04d1e058d
Branch commit hash: 6b411ca6ad743a0d0cb2147ab5624372c6c8b022

Query Group Query Name Mean Time - Commit (ms) Mean Time - Master (ms) Diff
aggregation q24 652.32 647.17 5.15 (0.80%)
aggregation q28 11803.11 11603.98 199.12 (1.72%)
filter q14 133.17 125.41 7.75 (6.18%)
filter q15 134.49 126.68 7.80 (6.16%)
filter q16 307.68 298.79 8.89 (2.98%)
filter q17 453.97 447.59 6.38 (1.42%)
filter q18 1994.34 1944.30 50.03 (2.57%)
filter zonemap-node 94.27 87.27 7.00 (8.02%)
filter zonemap-node-lhs-cast 95.11 87.44 7.67 (8.77%)
filter zonemap-rel 5750.95 5709.27 41.68 (0.73%)
fixed_size_expr_evaluator q07 592.13 581.07 11.06 (1.90%)
fixed_size_expr_evaluator q08 826.29 809.69 16.60 (2.05%)
fixed_size_expr_evaluator q09 818.58 810.57 8.01 (0.99%)
fixed_size_expr_evaluator q10 245.23 244.13 1.10 (0.45%)
fixed_size_expr_evaluator q11 237.15 235.72 1.42 (0.60%)
fixed_size_expr_evaluator q12 239.50 237.23 2.26 (0.95%)
fixed_size_expr_evaluator q13 1492.00 1456.84 35.16 (2.41%)
fixed_size_seq_scan q23 121.38 116.84 4.54 (3.89%)
join q29 629.30 637.69 -8.39 (-1.32%)
join q30 1563.27 1560.85 2.42 (0.16%)
join q31 4.36 5.07 -0.70 (-13.87%)
join SelectiveTwoHopJoin 52.37 56.33 -3.96 (-7.03%)
ldbc_snb_ic q35 2624.26 2642.46 -18.20 (-0.69%)
ldbc_snb_ic q36 547.01 537.18 9.83 (1.83%)
ldbc_snb_is q32 5.02 5.16 -0.14 (-2.71%)
ldbc_snb_is q33 13.12 15.62 -2.50 (-16.01%)
ldbc_snb_is q34 1.09 1.08 0.01 (0.90%)
multi-rel multi-rel-large-scan 1337.43 1234.16 103.27 (8.37%)
multi-rel multi-rel-lookup 31.14 21.75 9.39 (43.17%)
multi-rel multi-rel-small-scan 90.07 104.18 -14.11 (-13.54%)
order_by q25 137.51 131.89 5.63 (4.27%)
order_by q26 456.92 449.61 7.30 (1.62%)
order_by q27 1487.46 1474.94 12.53 (0.85%)
recursive_join recursive-join-bidirection 277.99 298.76 -20.77 (-6.95%)
recursive_join recursive-join-dense 7431.43 7364.18 67.25 (0.91%)
recursive_join recursive-join-path 23792.67 23804.54 -11.87 (-0.05%)
recursive_join recursive-join-sparse 14264.78 14748.68 -483.90 (-3.28%)
recursive_join recursive-join-trail 7373.86 7293.80 80.05 (1.10%)
scan_after_filter q01 179.73 169.83 9.90 (5.83%)
scan_after_filter q02 165.53 167.27 -1.74 (-1.04%)
shortest_path_ldbc100 q37 88.23 95.98 -7.76 (-8.08%)
shortest_path_ldbc100 q38 337.87 348.01 -10.13 (-2.91%)
shortest_path_ldbc100 q39 60.20 64.77 -4.57 (-7.06%)
shortest_path_ldbc100 q40 442.57 421.22 21.35 (5.07%)
var_size_expr_evaluator q03 2110.96 2078.72 32.24 (1.55%)
var_size_expr_evaluator q04 2298.00 2232.03 65.97 (2.96%)
var_size_expr_evaluator q05 2634.58 2651.46 -16.88 (-0.64%)
var_size_expr_evaluator q06 1327.18 1332.11 -4.93 (-0.37%)
var_size_seq_scan q19 1475.67 1457.11 18.56 (1.27%)
var_size_seq_scan q20 2723.65 2716.34 7.31 (0.27%)
var_size_seq_scan q21 2292.81 2292.11 0.70 (0.03%)
var_size_seq_scan q22 127.69 129.58 -1.89 (-1.46%)

Copy link
Contributor

@ray6080 ray6080 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a few comments left, hope we can discuss them again after you get this in.

src/planner/plan/plan_subquery.cpp Outdated Show resolved Hide resolved
src/planner/plan/plan_subquery.cpp Outdated Show resolved Hide resolved
src/planner/plan/plan_subquery.cpp Outdated Show resolved Hide resolved
src/include/common/data_chunk/sel_vector.h Show resolved Hide resolved
src/planner/plan/plan_subquery.cpp Show resolved Hide resolved
src/planner/plan/plan_subquery.cpp Outdated Show resolved Hide resolved
src/planner/plan/plan_subquery.cpp Show resolved Hide resolved
src/planner/plan/plan_subquery.cpp Outdated Show resolved Hide resolved
src/planner/plan/plan_subquery.cpp Outdated Show resolved Hide resolved
src/planner/plan/plan_subquery.cpp Outdated Show resolved Hide resolved
@andyfengHKU andyfengHKU force-pushed the improve-subquery-planning branch from 3530c01 to d9797ac Compare December 21, 2024 17:15
Copy link

Benchmark Result

Master commit hash: 9dd8083966983040768a51acac258996f7d7e931
Branch commit hash: d85d4fbcdbf634866d5da59823c48c75c6c44fc3

Query Group Query Name Mean Time - Commit (ms) Mean Time - Master (ms) Diff
aggregation q24 657.63 654.00 3.63 (0.56%)
aggregation q28 11568.92 12164.68 -595.76 (-4.90%)
filter q14 129.22 130.23 -1.01 (-0.78%)
filter q15 133.36 134.15 -0.79 (-0.59%)
filter q16 309.97 312.50 -2.53 (-0.81%)
filter q17 455.04 459.22 -4.18 (-0.91%)
filter q18 1982.71 1974.78 7.93 (0.40%)
filter zonemap-node 90.12 92.09 -1.97 (-2.14%)
filter zonemap-node-lhs-cast 90.10 91.75 -1.65 (-1.80%)
filter zonemap-node-null 89.62 87.48 2.14 (2.45%)
filter zonemap-rel 5729.30 5780.24 -50.94 (-0.88%)
fixed_size_expr_evaluator q07 572.34 589.14 -16.80 (-2.85%)
fixed_size_expr_evaluator q08 803.18 824.35 -21.16 (-2.57%)
fixed_size_expr_evaluator q09 803.36 824.97 -21.61 (-2.62%)
fixed_size_expr_evaluator q10 237.01 247.54 -10.53 (-4.25%)
fixed_size_expr_evaluator q11 230.58 248.04 -17.46 (-7.04%)
fixed_size_expr_evaluator q12 225.48 237.62 -12.14 (-5.11%)
fixed_size_expr_evaluator q13 1467.43 1475.95 -8.52 (-0.58%)
fixed_size_seq_scan q23 113.85 121.24 -7.39 (-6.10%)
join q29 603.35 590.37 12.98 (2.20%)
join q30 1475.28 1496.22 -20.94 (-1.40%)
join q31 6.37 4.68 1.69 (36.12%)
join SelectiveTwoHopJoin 50.10 52.83 -2.73 (-5.17%)
ldbc_snb_ic q35 2595.64 2641.14 -45.50 (-1.72%)
ldbc_snb_ic q36 543.85 544.18 -0.33 (-0.06%)
ldbc_snb_is q32 3.87 6.79 -2.92 (-43.00%)
ldbc_snb_is q33 10.76 13.10 -2.34 (-17.88%)
ldbc_snb_is q34 1.18 1.11 0.07 (6.34%)
multi-rel multi-rel-large-scan 1352.40 1256.77 95.63 (7.61%)
multi-rel multi-rel-lookup 17.81 34.07 -16.26 (-47.73%)
multi-rel multi-rel-small-scan 74.97 96.63 -21.66 (-22.41%)
order_by q25 132.78 134.27 -1.50 (-1.11%)
order_by q26 449.10 453.73 -4.64 (-1.02%)
order_by q27 1496.00 1490.43 5.57 (0.37%)
recursive_join recursive-join-bidirection 302.74 296.43 6.31 (2.13%)
recursive_join recursive-join-dense 7428.15 7448.57 -20.42 (-0.27%)
recursive_join recursive-join-path 24076.16 24358.15 -281.99 (-1.16%)
recursive_join recursive-join-sparse 14559.15 14256.72 302.43 (2.12%)
recursive_join recursive-join-trail 7352.34 7428.48 -76.14 (-1.02%)
scan_after_filter q01 172.30 172.84 -0.54 (-0.31%)
scan_after_filter q02 159.33 158.72 0.61 (0.38%)
shortest_path_ldbc100 q37 94.06 80.65 13.42 (16.64%)
shortest_path_ldbc100 q38 253.96 369.19 -115.23 (-31.21%)
shortest_path_ldbc100 q39 62.42 64.23 -1.80 (-2.81%)
shortest_path_ldbc100 q40 358.44 442.15 -83.71 (-18.93%)
var_size_expr_evaluator q03 2150.37 2110.43 39.94 (1.89%)
var_size_expr_evaluator q04 2300.04 2249.98 50.06 (2.23%)
var_size_expr_evaluator q05 2683.55 2635.61 47.94 (1.82%)
var_size_expr_evaluator q06 1385.60 1363.57 22.03 (1.62%)
var_size_seq_scan q19 1493.88 1471.37 22.51 (1.53%)
var_size_seq_scan q20 2638.42 2724.37 -85.95 (-3.15%)
var_size_seq_scan q21 2309.63 2317.65 -8.01 (-0.35%)
var_size_seq_scan q22 131.43 128.12 3.31 (2.58%)

@acquamarin acquamarin merged commit c93bbd9 into master Dec 22, 2024
@acquamarin acquamarin deleted the improve-subquery-planning branch December 22, 2024 02:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants