-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: 1.9x improvement to median q-error by fixing multi-equality join selectivity #171
Conversation
…which_maintains_mst
…oin_selectivity_from_most_selective_columns
@@ -430,27 +393,43 @@ impl< | |||
/// NOTE: This function modifies `past_eq_columns` by adding `predicate` to it. | |||
fn get_join_selectivity_adjustment_from_redundant_predicates( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function should be renamed to something like get_join_selectivity_from_col_eq_predicate
, and the comment should emphasize the principle of inclusion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch!
…nt_when_adding_to_multi_equality_graph()
Summary: Previously, we computed multi-equality join selectivity by building an MST of the join graph. However, the correct method is to take the N-1 nodes with the highest n-distinct values.
Demo:
This fix causes us to beat Postgres on median q-error for the first time ever. We also now beat Postgres on p90 q-error for the first time ever. Overall, it improves our median q-error by 1.9x, p90 q-error by 3.4x, p95 q-error by 42.1x, p99 q-error by 2.6x, and lets us beat Postgres on 9 queries we previously didn't beat them on.
Before (after changing
DEFAULT_PRECISION
andDEFAULT_K_TO_TRACK
but before multi-equality fix):After:
Details: