[Enhancement] Compare mv rowCount when both mv dimensions contains query dimensions #51511

kaijianding · 2024-09-27T09:39:51Z

Why I'm doing:

-- rows: 100 mv1: 
select sum(v1) from t1 group by a, b, c, f; 

-- rows: 10000 mv2: 
select sum(v1) from t1 group by a, b, d; 

query: select sum(v1) from t1 where b = 'a' group by a;

Before: prefer mv with less dimensions -> mv2, but mv2 has more rows which is not expected.
After: when many mvs satisfy query, prefer mv with less rows, so choose mv1.

What I'm doing:

MaterializationContext.RewriteOrdering decides which MVs are retained in case too many MV candidates and candidate list should be truncated. MaterializationContext.RewriteOrdering should compare MV's maxPartitionRowCount rather than MV's total row count, in case a MV is with less partitions and total row count but maxPartitionRowCount is large.
BestMvSelector is the actually place to decide which MV should be chose. The comparator in BestMvSelector should consider row count first if MV's outputRowCount is not zero in statistics.

Fixes #issue

What type of PR is this:

Does this PR entail a change in behavior?

Yes, this PR will result in a change in behavior.
No, this PR will not result in a change in behavior.

Checklist:

I have added test cases for my bug fix or my new feature
This pr needs user documentation (for new or modified features or behaviors)
- I have added documentation for my new feature or new function
This is a backport pr

Bugfix cherry-pick branch check:

fe/fe-core/src/main/java/com/starrocks/sql/optimizer/MaterializationContext.java

LiShuMing · 2024-10-12T02:10:26Z

fe/fe-core/src/main/java/com/starrocks/sql/optimizer/MaterializationContext.java

+ return orderingRowCount(mv);
+ }
+ return (long) r;
+ })


eg:

-- rows: 100 mv1: select sum(v1) from t1 group by a, b, c; -- rows: 10000 mv2: select sum(v1) from t1 group by a, b; query: select sum(v1) from t1 where b = 'a' group by a;

Before: prefer mv with less dimensions -> mv2, but mv2 has more rows which is not expected.
After: when many mvs satisfy query, prefer mv with less rows, so choose mv1.

am I right? can you add more comments about this?

And can you move this codes into orderingAggregation to make it more clear?

Yes，this is what I expect。but mv definition is a little different.

-- rows: 100 mv1: select sum(v1) from t1 group by a, b, c, f; -- rows: 10000 mv2: select sum(v1) from t1 group by a, b, d; query: select sum(v1) from t1 where b = 'a' group by a;

orderingAggregation is about single MV, while these codes are about 2 MVs, so I don't change logic in orderingAggregation

Seems solid.
Can you add this example into comments?

…ery dimensions Signed-off-by: kaijian.ding <[email protected]>

LiShuMing · 2024-10-17T02:43:02Z

fe/fe-core/src/main/java/com/starrocks/sql/optimizer/MaterializationContext.java

+ return orderingRowCount(mv);
+ }
+ return (long) r;
+ })


Seems solid.
Can you add this example into comments?

fe/fe-core/src/main/java/com/starrocks/sql/optimizer/MaterializationContext.java

...ain/java/com/starrocks/sql/optimizer/rule/transformation/materialization/BestMvSelector.java

satanson · 2024-10-17T02:55:52Z

fe/fe-core/src/main/java/com/starrocks/sql/optimizer/MaterializationContext.java

@@ -370,8 +371,16 @@ public int compare(MaterializationContext o1, MaterializationContext o2) {
 OperatorType o2Type = o2.getMvExpression().getOp().getOpType();

 if (o1Type == o2Type && (o1Type == OperatorType.LOGICAL_AGGR)) {
+ boolean mvHasDifferentRows = orderingRowCount(o1) != 0 && orderingRowCount(o2) != 0


is this policy too sensitive? for an exmaple, MV0 and MV1 have N rows and N+1 rows respectively, MV1 may be a promising one for the certain query? so mvHasDifferentRows should introduce an epsilon(ε) so that |MV1-MV1|/max(|MV1|,|MV2|) > ε, then we consider that the larger one is superior to the smaller one significantly.

But we don't have enough clue to tell if MV1 is actually better than MV0 or not here.
if mvHasDifferentRows is set to false when MV0 and MV1 have similar row number, the later code still compares the dimension diff number and later the row number.

My point is that we can't exactlly tell which mv is better for a particular query, we can only believe that comparing row count is more promising than comparing dimension diff number.

rowCount is promising only when there is a significant gap between two MVs' rowcounts. if the row counts are almost the same, we also can not tell out which MV is better. I show an exmaple:

MV0: [d0,d1,d2, sum(m0)];

MV1: [d1,d0,d3, sum(m0)];
we suppose that |MV0| = N, |MV1|= M, N>M and N-M is small;

for query: select d1, sum(m0) from t where predicate(d0) group by d1;
if MV0 has short key on d0, this query fetches data from IO layer quickly.
so I think only two MV's rowcount are different drastically, the one of smaller rowcount is promissing. if they are close, we should not use the rowcounts.

I understand your scenario and the limitation of exactly comparing mv row count.

What I do in this PR is prefering comparing mv rount to comparing the deminsion diff number, I get that this is not the most perfect solution. But it is much better than current logic.

Current we don't have enough information(eg: the key columns used in query's predicate and how many rows can be reduced by this predicate, etc) to make sure that MV1 is better choice than MV0 when MV1 has larger row count, other than not choosing MV0 when MV0 has less row count.

My point is that we can't tell which mv is better when the two mv have similar row count according to the current information from the context at current stage, it's still better comparing row count than comparing dimension diff number.

I have changed the logic in BestMvSeclector which is the actually class choosing mv according to your comment

Signed-off-by: kaijian.ding <[email protected]>

sonarcloud · 2024-10-18T11:22:32Z

Quality Gate passed

Issues
5 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarCloud

github-actions · 2024-10-18T13:01:49Z

[Java-Extensions Incremental Coverage Report]

✅ pass : 0 / 0 (0%)

github-actions · 2024-10-18T13:02:31Z

[FE Incremental Coverage Report]

✅ pass : 21 / 22 (95.45%)

file detail

	path	covered_line	new_line	coverage	not_covered_line_detail
🔵	com/starrocks/sql/optimizer/rule/transformation/materialization/BestMvSelector.java	8	9	88.89%	[155]
🔵	com/starrocks/catalog/MaterializedView.java	5	5	100.00%	[]
🔵	com/starrocks/sql/optimizer/MaterializationContext.java	8	8	100.00%	[]

github-actions · 2024-10-18T13:02:36Z

[BE Incremental Coverage Report]

✅ pass : 0 / 0 (0%)

kaijianding requested a review from a team as a code owner September 27, 2024 09:39

github-actions bot added the 3.3 label Sep 27, 2024

mergify bot assigned kaijianding Sep 27, 2024

starrocks-cr bot reviewed Sep 27, 2024

View reviewed changes

fe/fe-core/src/main/java/com/starrocks/sql/optimizer/MaterializationContext.java Show resolved Hide resolved

LiShuMing reviewed Oct 12, 2024

View reviewed changes

kaijianding force-pushed the mv-ordering branch from a46c4da to 1e56272 Compare October 16, 2024 09:48

kaijianding requested a review from a team as a code owner October 16, 2024 12:15

kaijianding force-pushed the mv-ordering branch from 638d680 to 5402e74 Compare October 16, 2024 13:49

[Enhancement] Compare mv rowCount when both mv dimensions contains qu…

018fc05

…ery dimensions Signed-off-by: kaijian.ding <[email protected]>

kaijianding force-pushed the mv-ordering branch from 5402e74 to 018fc05 Compare October 17, 2024 02:33

LiShuMing reviewed Oct 17, 2024

View reviewed changes

satanson reviewed Oct 17, 2024

View reviewed changes

add some comment

8519842

Signed-off-by: kaijian.ding <[email protected]>

LiShuMing previously approved these changes Oct 17, 2024

View reviewed changes

improve compare in BestMvSelector

8a9c656

Signed-off-by: kaijian.ding <[email protected]>

kaijianding dismissed LiShuMing’s stale review via 8a9c656 October 18, 2024 11:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Enhancement] Compare mv rowCount when both mv dimensions contains query dimensions #51511

[Enhancement] Compare mv rowCount when both mv dimensions contains query dimensions #51511

kaijianding commented Sep 27, 2024 •

edited

Loading

LiShuMing Oct 12, 2024

LiShuMing Oct 12, 2024

kaijianding Oct 12, 2024 •

edited

Loading

kaijianding Oct 12, 2024 •

edited

Loading

LiShuMing Oct 17, 2024

LiShuMing Oct 17, 2024

satanson Oct 17, 2024

kaijianding Oct 17, 2024 •

edited

Loading

satanson Oct 17, 2024

kaijianding Oct 17, 2024 •

edited

Loading

kaijianding Oct 18, 2024

sonarcloud bot commented Oct 18, 2024

github-actions bot commented Oct 18, 2024

github-actions bot commented Oct 18, 2024

github-actions bot commented Oct 18, 2024

[Enhancement] Compare mv rowCount when both mv dimensions contains query dimensions #51511

Are you sure you want to change the base?

[Enhancement] Compare mv rowCount when both mv dimensions contains query dimensions #51511

Conversation

kaijianding commented Sep 27, 2024 • edited Loading

Why I'm doing:

What I'm doing:

What type of PR is this:

Checklist:

Bugfix cherry-pick branch check:

LiShuMing Oct 12, 2024

Choose a reason for hiding this comment

LiShuMing Oct 12, 2024

Choose a reason for hiding this comment

kaijianding Oct 12, 2024 • edited Loading

Choose a reason for hiding this comment

kaijianding Oct 12, 2024 • edited Loading

Choose a reason for hiding this comment

LiShuMing Oct 17, 2024

Choose a reason for hiding this comment

LiShuMing Oct 17, 2024

Choose a reason for hiding this comment

satanson Oct 17, 2024

Choose a reason for hiding this comment

kaijianding Oct 17, 2024 • edited Loading

Choose a reason for hiding this comment

satanson Oct 17, 2024

Choose a reason for hiding this comment

kaijianding Oct 17, 2024 • edited Loading

Choose a reason for hiding this comment

kaijianding Oct 18, 2024

Choose a reason for hiding this comment

sonarcloud bot commented Oct 18, 2024

Quality Gate passed

github-actions bot commented Oct 18, 2024

[Java-Extensions Incremental Coverage Report]

github-actions bot commented Oct 18, 2024

[FE Incremental Coverage Report]

file detail

github-actions bot commented Oct 18, 2024

[BE Incremental Coverage Report]

kaijianding commented Sep 27, 2024 •

edited

Loading

kaijianding Oct 12, 2024 •

edited

Loading

kaijianding Oct 12, 2024 •

edited

Loading

kaijianding Oct 17, 2024 •

edited

Loading

kaijianding Oct 17, 2024 •

edited

Loading