Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

remove graph entry from fts input #4660

Merged
merged 6 commits into from
Dec 20, 2024
Merged

Conversation

andyfengHKU
Copy link
Contributor

Description

Please include a summary of the changes and the related issue (if applicable). Please also include
relevant motivation and context.

Fixes # (issue)

Contributor agreement

auto graphEntry = graph::GraphEntry({termsEntry, docsEntry}, {appearsInEntry});
// Bind output node.
auto nodeOutput = bindNodeOutput(input.binder, {tableEntry});
auto qftsBindData = std::make_unique<QFTSGDSBindData>(std::move(graphEntry), nodeOutput);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we not save the graphEntry in the bindData? I think we can generate the graphEntry during edgeCompute

Copy link

Benchmark Result

Master commit hash: ea98fb1d3ebd69858317b682e02046974116f56c
Branch commit hash: 6d7efa3327d38ed2406724b796101ceff118ca99

Query Group Query Name Mean Time - Commit (ms) Mean Time - Master (ms) Diff
aggregation q24 652.66 655.83 -3.18 (-0.48%)
aggregation q28 11394.80 11940.05 -545.25 (-4.57%)
filter q14 136.02 128.75 7.27 (5.65%)
filter q15 137.26 129.98 7.28 (5.60%)
filter q16 307.35 304.38 2.97 (0.98%)
filter q17 456.14 454.03 2.12 (0.47%)
filter q18 1918.71 1962.50 -43.79 (-2.23%)
filter zonemap-node 96.50 90.91 5.59 (6.14%)
filter zonemap-node-lhs-cast 96.66 89.18 7.48 (8.38%)
filter zonemap-node-null 93.61 85.24 8.37 (9.82%)
filter zonemap-rel 5852.18 5804.58 47.59 (0.82%)
fixed_size_expr_evaluator q07 580.28 590.29 -10.01 (-1.70%)
fixed_size_expr_evaluator q08 812.30 822.09 -9.79 (-1.19%)
fixed_size_expr_evaluator q09 811.45 807.54 3.91 (0.48%)
fixed_size_expr_evaluator q10 245.14 245.48 -0.34 (-0.14%)
fixed_size_expr_evaluator q11 237.20 236.87 0.33 (0.14%)
fixed_size_expr_evaluator q12 234.20 238.14 -3.94 (-1.65%)
fixed_size_expr_evaluator q13 1460.27 1468.58 -8.32 (-0.57%)
fixed_size_seq_scan q23 127.22 120.64 6.58 (5.45%)
join q29 603.39 638.23 -34.85 (-5.46%)
join q30 1572.49 1602.93 -30.44 (-1.90%)
join q31 4.36 3.53 0.83 (23.52%)
join SelectiveTwoHopJoin 52.15 52.94 -0.79 (-1.50%)
ldbc_snb_ic q35 2602.02 2637.72 -35.70 (-1.35%)
ldbc_snb_ic q36 553.58 531.72 21.86 (4.11%)
ldbc_snb_is q32 6.93 6.04 0.89 (14.78%)
ldbc_snb_is q33 13.02 15.45 -2.43 (-15.72%)
ldbc_snb_is q34 1.03 1.04 -0.01 (-1.25%)
multi-rel multi-rel-large-scan 1261.29 1200.95 60.34 (5.02%)
multi-rel multi-rel-lookup 20.72 17.02 3.71 (21.80%)
multi-rel multi-rel-small-scan 87.61 103.27 -15.66 (-15.17%)
order_by q25 142.06 136.17 5.89 (4.32%)
order_by q26 473.15 456.77 16.38 (3.59%)
order_by q27 1500.32 1489.28 11.04 (0.74%)
recursive_join recursive-join-bidirection 260.89 297.74 -36.85 (-12.38%)
recursive_join recursive-join-dense 7436.56 7439.58 -3.02 (-0.04%)
recursive_join recursive-join-path 24036.12 24135.27 -99.15 (-0.41%)
recursive_join recursive-join-sparse 14295.89 14878.48 -582.59 (-3.92%)
recursive_join recursive-join-trail 7362.49 7387.09 -24.60 (-0.33%)
scan_after_filter q01 181.23 171.14 10.09 (5.90%)
scan_after_filter q02 164.77 158.43 6.35 (4.01%)
shortest_path_ldbc100 q37 90.47 98.94 -8.47 (-8.56%)
shortest_path_ldbc100 q38 348.28 348.53 -0.26 (-0.07%)
shortest_path_ldbc100 q39 57.63 65.64 -8.01 (-12.20%)
shortest_path_ldbc100 q40 342.41 446.69 -104.29 (-23.35%)
var_size_expr_evaluator q03 2108.69 2095.21 13.49 (0.64%)
var_size_expr_evaluator q04 2252.93 2289.30 -36.37 (-1.59%)
var_size_expr_evaluator q05 2658.53 2653.28 5.24 (0.20%)
var_size_expr_evaluator q06 1366.06 1361.55 4.51 (0.33%)
var_size_seq_scan q19 1499.70 1480.01 19.69 (1.33%)
var_size_seq_scan q20 2803.25 2780.40 22.85 (0.82%)
var_size_seq_scan q21 2332.00 2316.18 15.81 (0.68%)
var_size_seq_scan q22 131.41 129.22 2.19 (1.69%)

Copy link

codecov bot commented Dec 20, 2024

Codecov Report

Attention: Patch coverage is 98.11321% with 1 line in your changes missing coverage. Please review.

Project coverage is 86.52%. Comparing base (86f1f43) to head (fba3a11).
Report is 2 commits behind head on master.

Files with missing lines Patch % Lines
src/function/gds/gds.cpp 75.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #4660      +/-   ##
==========================================
+ Coverage   86.51%   86.52%   +0.01%     
==========================================
  Files        1369     1372       +3     
  Lines       57951    57997      +46     
  Branches     7206     7207       +1     
==========================================
+ Hits        50135    50182      +47     
+ Misses       7648     7647       -1     
  Partials      168      168              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link

Benchmark Result

Master commit hash: ea98fb1d3ebd69858317b682e02046974116f56c
Branch commit hash: 228f0441eaccf0fc21aa3f622b162ac7e03936be

Query Group Query Name Mean Time - Commit (ms) Mean Time - Master (ms) Diff
aggregation q24 644.54 655.83 -11.30 (-1.72%)
aggregation q28 11441.46 11940.05 -498.59 (-4.18%)
filter q14 126.13 128.75 -2.63 (-2.04%)
filter q15 127.59 129.98 -2.40 (-1.84%)
filter q16 304.29 304.38 -0.09 (-0.03%)
filter q17 448.59 454.03 -5.44 (-1.20%)
filter q18 1890.40 1962.50 -72.11 (-3.67%)
filter zonemap-node 88.18 90.91 -2.73 (-3.00%)
filter zonemap-node-lhs-cast 88.61 89.18 -0.57 (-0.64%)
filter zonemap-node-null 84.70 85.24 -0.54 (-0.63%)
filter zonemap-rel 5802.19 5804.58 -2.40 (-0.04%)
fixed_size_expr_evaluator q07 576.16 590.29 -14.13 (-2.39%)
fixed_size_expr_evaluator q08 804.26 822.09 -17.83 (-2.17%)
fixed_size_expr_evaluator q09 806.11 807.54 -1.42 (-0.18%)
fixed_size_expr_evaluator q10 243.39 245.48 -2.09 (-0.85%)
fixed_size_expr_evaluator q11 236.03 236.87 -0.84 (-0.36%)
fixed_size_expr_evaluator q12 229.88 238.14 -8.26 (-3.47%)
fixed_size_expr_evaluator q13 1467.70 1468.58 -0.88 (-0.06%)
fixed_size_seq_scan q23 119.96 120.64 -0.68 (-0.56%)
join q29 626.04 638.23 -12.20 (-1.91%)
join q30 1439.53 1602.93 -163.40 (-10.19%)
join q31 5.03 3.53 1.49 (42.29%)
join SelectiveTwoHopJoin 53.58 52.94 0.64 (1.20%)
ldbc_snb_ic q35 2633.03 2637.72 -4.69 (-0.18%)
ldbc_snb_ic q36 565.65 531.72 33.92 (6.38%)
ldbc_snb_is q32 6.61 6.04 0.58 (9.53%)
ldbc_snb_is q33 13.12 15.45 -2.33 (-15.09%)
ldbc_snb_is q34 1.09 1.04 0.05 (4.39%)
multi-rel multi-rel-large-scan 1224.96 1200.95 24.00 (2.00%)
multi-rel multi-rel-lookup 9.99 17.02 -7.02 (-41.27%)
multi-rel multi-rel-small-scan 81.91 103.27 -21.36 (-20.69%)
order_by q25 134.27 136.17 -1.90 (-1.39%)
order_by q26 455.09 456.77 -1.68 (-0.37%)
order_by q27 1464.23 1489.28 -25.05 (-1.68%)
recursive_join recursive-join-bidirection 308.36 297.74 10.62 (3.57%)
recursive_join recursive-join-dense 7393.14 7439.58 -46.44 (-0.62%)
recursive_join recursive-join-path 24024.83 24135.27 -110.44 (-0.46%)
recursive_join recursive-join-sparse 14337.20 14878.48 -541.28 (-3.64%)
recursive_join recursive-join-trail 7345.12 7387.09 -41.97 (-0.57%)
scan_after_filter q01 171.30 171.14 0.16 (0.09%)
scan_after_filter q02 157.50 158.43 -0.93 (-0.59%)
shortest_path_ldbc100 q37 83.03 98.94 -15.91 (-16.08%)
shortest_path_ldbc100 q38 351.47 348.53 2.93 (0.84%)
shortest_path_ldbc100 q39 60.23 65.64 -5.40 (-8.23%)
shortest_path_ldbc100 q40 428.63 446.69 -18.06 (-4.04%)
var_size_expr_evaluator q03 2076.21 2095.21 -18.99 (-0.91%)
var_size_expr_evaluator q04 2205.20 2289.30 -84.11 (-3.67%)
var_size_expr_evaluator q05 2655.91 2653.28 2.63 (0.10%)
var_size_expr_evaluator q06 1333.62 1361.55 -27.93 (-2.05%)
var_size_seq_scan q19 1452.84 1480.01 -27.17 (-1.84%)
var_size_seq_scan q20 2790.01 2780.40 9.60 (0.35%)
var_size_seq_scan q21 2336.40 2316.18 20.22 (0.87%)
var_size_seq_scan q22 126.77 129.22 -2.46 (-1.90%)

@acquamarin
Copy link
Collaborator

acquamarin commented Dec 20, 2024

Query FTS index benchmark:
ac4, 512 GB RAM, dataset: ms-passage (884M docs)

Threads DuckDB Kuzu
1 26.22 s 7.412 s
2 13.55 s 4.564 s
4 6.81 s 3.105 s
8 3.5 s 2.178 s

Create FTS index benchmark:

Threads DuckDB Kuzu
128 1.1 minute 21.4 minutes

@acquamarin acquamarin merged commit 3463783 into master Dec 20, 2024
@acquamarin acquamarin deleted the remove-graph-entry-from-fts-input branch December 20, 2024 22:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants