cost-model: group-by column ref property + cost model design? #202

skyzh · 2024-10-29T22:13:07Z

currently, aggregation group-by's logical property is like:

select v1 from t1 group by v1;

Agg group=v1 <- schema=[v1], column_ref=[v1]
  Scan t1

but actually, group by could change the distribution of the column, so probably we should set it to derived, or find a way to represent it? if a later join refers to this column, we should treat it differently.

The text was updated successfully, but these errors were encountered:

jurplel · 2024-10-30T00:58:02Z

so probably we should set it to derived

How did you mean? It looks like it's just storing the group by column. I am not sure i'm following where distribution of a column is stored here

skyzh · 2024-10-30T03:47:00Z

I think it's probably better to store it as Distinct(v1) in column ref logical property so that the cost model can take such information into account

skyzh mentioned this issue Oct 29, 2024

Extend schema to contain table name, and migrate cost model's ColumnRef property usage to schema #126

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cost-model: group-by column ref property + cost model design? #202

cost-model: group-by column ref property + cost model design? #202

skyzh commented Oct 29, 2024

jurplel commented Oct 30, 2024

skyzh commented Oct 30, 2024

cost-model: group-by column ref property + cost model design? #202

cost-model: group-by column ref property + cost model design? #202

Comments

skyzh commented Oct 29, 2024

jurplel commented Oct 30, 2024

skyzh commented Oct 30, 2024