Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
fix: 1.9x improvement to median q-error by fixing multi-equality join…
… selectivity (#171) **Summary**: Previously, we computed multi-equality join selectivity by building an MST of the join graph. However, the correct method is to take the N-1 _nodes_ with the highest n-distinct values. **Demo**: This fix causes us to **beat Postgres on median q-error** for the first time ever. We also now beat Postgres on p90 q-error for the first time ever. Overall, it improves our median q-error by 1.9x, p90 q-error by 3.4x, p95 q-error by 42.1x, p99 q-error by 2.6x, and lets us beat Postgres on 9 queries we previously didn't beat them on. Before (after changing `DEFAULT_PRECISION` and `DEFAULT_K_TO_TRACK` but before multi-equality fix): ![Screenshot 2024-04-27 at 16 00 18](https://github.com/cmu-db/optd/assets/20631215/7f53fcbc-a755-42eb-8f23-3216790d5a50) After: ![Screenshot 2024-04-27 at 20 02 00](https://github.com/cmu-db/optd/assets/20631215/5bae0945-9375-4b2d-990b-949a1c3f3317) **Details**: * To see the problem, consider joining three tables where T1 has 2 distinct values, T2 has 3, and T3 has 4. Assume all values only occur once per table, and all values in the smaller tables appear in the larger tables. The cartesian product of all three tables is 24 and the join result is 2, so the overall selectivity should be 1/12. However, the old method would sometimes give an overall selectivity of 1/16 because there are two edges in the join graph (T1-T3 and T2-T3) which both have a selectivity of 1/4. * Added rigorous unit tests to test all possible permutations of three-table joins. Before the fix, some of these tests were failing. After the fix, these tests passed. * Properly handling cases where we are adding a predicate that either _extends_ an existing connected component or _merges_ two existing connected components. Added unit tests for both of these cases as well.
- Loading branch information