Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: 1.9x improvement to median q-error by fixing multi-equality join selectivity #171

Merged
merged 12 commits into from
Apr 28, 2024

Conversation

wangpatrick57
Copy link
Member

Summary: Previously, we computed multi-equality join selectivity by building an MST of the join graph. However, the correct method is to take the N-1 nodes with the highest n-distinct values.

Demo:
This fix causes us to beat Postgres on median q-error for the first time ever. We also now beat Postgres on p90 q-error for the first time ever. Overall, it improves our median q-error by 1.9x, p90 q-error by 3.4x, p95 q-error by 42.1x, p99 q-error by 2.6x, and lets us beat Postgres on 9 queries we previously didn't beat them on.

Before (after changing DEFAULT_PRECISION and DEFAULT_K_TO_TRACK but before multi-equality fix):
Screenshot 2024-04-27 at 16 00 18

After:
Screenshot 2024-04-27 at 20 02 00

Details:

  • To see the problem, consider joining three tables where T1 has 2 distinct values, T2 has 3, and T3 has 4. Assume all values only occur once per table, and all values in the smaller tables appear in the larger tables. The cartesian product of all three tables is 24 and the join result is 2, so the overall selectivity should be 1/12. However, the old method would sometimes give an overall selectivity of 1/16 because there are two edges in the join graph (T1-T3 and T2-T3) which both have a selectivity of 1/4.
  • Added rigorous unit tests to test all possible permutations of three-table joins. Before the fix, some of these tests were failing. After the fix, these tests passed.
  • Properly handling cases where we are adding a predicate that either extends an existing connected component or merges two existing connected components. Added unit tests for both of these cases as well.

@wangpatrick57 wangpatrick57 marked this pull request as ready for review April 28, 2024 00:03
@@ -430,27 +393,43 @@ impl<
/// NOTE: This function modifies `past_eq_columns` by adding `predicate` to it.
fn get_join_selectivity_adjustment_from_redundant_predicates(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function should be renamed to something like get_join_selectivity_from_col_eq_predicate, and the comment should emphasize the principle of inclusion.

Copy link
Contributor

@Gun9niR Gun9niR left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch!

@wangpatrick57 wangpatrick57 merged commit 5958b3d into main Apr 28, 2024
1 check passed
@wangpatrick57 wangpatrick57 deleted the phw2/multi-equality-fix branch April 28, 2024 17:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants