Skip to content

Commit

Permalink
aggregation: allow specifying none as default in aggregations
Browse files Browse the repository at this point in the history
  • Loading branch information
nonibansal committed Dec 11, 2024
1 parent af3c39e commit 9150c77
Show file tree
Hide file tree
Showing 23 changed files with 399 additions and 125 deletions.
2 changes: 2 additions & 0 deletions docs/examples/api-reference/aggregations/lastk.py
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,7 @@ def lastk_pipeline(cls, ds: Dataset):
of="amount",
limit=10,
dedup=False,
dropnull=False,
window=Continuous("1d"),
),
# docsnip-highlight end
Expand Down Expand Up @@ -142,6 +143,7 @@ def bad_pipeline(cls, ds: Dataset):
of="amount",
limit=10,
dedup=False,
dropnull=False,
window=Continuous("1d"),
),
# docsnip-highlight end
Expand Down
18 changes: 10 additions & 8 deletions docs/pages/api-reference/aggregations/average.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,28 +22,30 @@ The name of the field in the output dataset that should store the result of this
aggregation. This field is expected to be of type `float`.
</Expandable>

<Expandable title="default" type="float">
<Expandable title="default" type="Optional[float]">
Average over an empty set of rows isn't well defined - Fennel returns `default`
in such cases.
in such cases. If the default is not set or is None, Fennel returns None and
in that case, the expected type of `into_field` must be `Optional[float]`.
</Expandable>

<pre snippet="api-reference/aggregations/avg#basic" status="success"
message="Average in rolling window of 1 day & 1 week">
</pre>

#### Returns
<Expandable type="float">
<Expandable type="Union[float, Optional[float]]">
Stores the result of the aggregation in the appropriate field of the output
dataset. If there are no rows in the aggregation window, `default` is used.
</Expandable>


#### Errors
<Expandable title="Average on non int/float types">
The input column denoted by `of` must either be of `int` or `float` types.
<Expandable title="Average on non int/float/decimal types">
The input column denoted by `of` must either be of `int` or `float` or
`decimal` types.

Note that unlike SQL, even aggregations over `Optional[int]` or `Optional[float]`
aren't allowed.
Note that like SQL, aggregations over `Optional[int]` or `Optional[float]`
are allowed.
</Expandable>

<Expandable title="Output and/or default aren't float">
Expand All @@ -52,7 +54,7 @@ The type of the field denoted by `into_field` in the output dataset and that of
</Expandable>

<pre snippet="api-reference/aggregations/avg#incorrect_type" status="error"
message="Can not take average over string, only int or float">
message="Can not take average over string, only int or float or decimal">
</pre>
<pre snippet="api-reference/aggregations/avg#non_matching_types" status="error"
message="Invalid type: ret is int but should be float">
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ Aggregation to compute a rolling exponential decay for each group within a windo
#### Parameters
<Expandable title="of" type="str">
Name of the field in the input dataset over which the decayed sum should be computed.
This field can only either be `int` or `float.
This field can only either be `int` or `float` or `decimal`.
</Expandable>

<Expandable title="window" type="Window">
Expand Down Expand Up @@ -46,8 +46,8 @@ are no rows to count, by default, it returns 0.0
The input column denoted by `of` must either be of `int` or `float` types.
The output field denoted by `into_field` must always be of type `float`.

Note that unlike SQL, even aggregations over `Optional[int]` or `Optional[float]`
aren't allowed.
Note that like SQL, aggregations over `Optional[int]` or `Optional[float]`
are allowed.
</Expandable>

<pre snippet="api-reference/aggregations/exp-decay-sum#incorrect_type_exp_decay" status="error"
Expand Down
5 changes: 5 additions & 0 deletions docs/pages/api-reference/aggregations/lastk.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,11 @@ If set to True, only distinct values are stored else values stored in the last
can have duplicates too.
</Expandable>

<Expandable title="dropnull" type="bool">
If set to True, None values are dropped from the result. It expects `of` field
to be of type `Optional[T]` and `into_field` gets the type `List[T]`.
</Expandable>

<pre snippet="api-reference/aggregations/lastk#basic" status="success"
message="LastK in window of 1 day">
</pre>
Expand Down
13 changes: 7 additions & 6 deletions docs/pages/api-reference/aggregations/max.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,10 +24,11 @@ aggregation. This field is expected to be of type `int`, `float`, `date` or
`of`.
</Expandable>

<Expandable title="default" type="Union[int, float]">
<Expandable title="default" type="Optional[Union[int, float, Decimal, datetime, date]]">
Max over an empty set of rows isn't well defined - Fennel returns `default`
in such cases. The type of `default` must be same as that of `of` in the input
dataset.
dataset. If the default is not set or is None, Fennel returns None and in that case,
the expected type of `into_field` must be `Optional[T]`.
</Expandable>

<pre snippet="api-reference/aggregations/max#basic" status="success"
Expand All @@ -43,11 +44,11 @@ dataset. If there are no rows in the aggregation window, `default` is used.

#### Errors
<Expandable title="Max on other types">
The input column denoted by `of` must be of `int`, `float`, `date` or `datetime`
types.
The input column denoted by `of` must be of `int`, `float`, `decimal`,
`date` or `datetime` types.

Note that unlike SQL, even aggregations over `Optional[int]` or `Optional[float]`
aren't allowed.
Note that like SQL, aggregations over `Optional[int]` or `Optional[float]`
are allowed.
</Expandable>

<Expandable title="Types of input, output & default don't match">
Expand Down
13 changes: 7 additions & 6 deletions docs/pages/api-reference/aggregations/min.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,10 +24,11 @@ aggregation. This field is expected to be of type `int`, `float`, `date` or
`of`.
</Expandable>

<Expandable title="default" type="Union[int, float]">
<Expandable title="default" type="Optional[Union[int, float, Decimal, datetime, date]]">
Min over an empty set of rows isn't well defined - Fennel returns `default`
in such cases. The type of `default` must be same as that of `of` in the input
dataset.
dataset. If the default is not set or is None, Fennel returns None and in that case,
the expected type of `into_field` must be `Optional[T]`.
</Expandable>

<pre snippet="api-reference/aggregations/min#basic" status="success"
Expand All @@ -43,11 +44,11 @@ dataset. If there are no rows in the aggregation window, `default` is used.

#### Errors
<Expandable title="Min on other types">
The input column denoted by `of` must be of `int`, `float`, `date` or `datetime`
types.
The input column denoted by `of` must be of `int`, `float`, `decimal`,
`date` or `datetime` types.

Note that unlike SQL, even aggregations over `Optional[int]` or `Optional[float]`
aren't allowed.
Note that like SQL, aggregations over `Optional[int]` or `Optional[float]`
are allowed.
</Expandable>

<Expandable title="Types of input, output & default don't match">
Expand Down
9 changes: 5 additions & 4 deletions docs/pages/api-reference/aggregations/quantile.md
Original file line number Diff line number Diff line change
Expand Up @@ -58,10 +58,11 @@ dataset. If there are no rows in the aggregation window, `default` is used.

#### Errors
<Expandable title="Quantile on non int/float types">
The input column denoted by `of` must either be of `int` or `float` types.
The input column denoted by `of` must either be of `int` or `float` or
`decimal` types.

Note that unlike SQL, even aggregations over `Optional[int]` or `Optional[float]`
aren't allowed.
Note that like SQL, aggregations over `Optional[int]` or `Optional[float]`
are allowed.
</Expandable>

<Expandable title="Types of output & default don't match">
Expand All @@ -81,7 +82,7 @@ right expectations and be compatible with future addition of exact quantiles.
</Expandable>

<pre snippet="api-reference/aggregations/quantile#incorrect_type" status="error"
message="Can not take quantile over string, only int or float">
message="Can not take quantile over string, only int or float or decimal">
</pre>
<pre snippet="api-reference/aggregations/quantile#invalid_default" status="error"
message="Default is not specified, so the output field should be Optional[float]">
Expand Down
17 changes: 10 additions & 7 deletions docs/pages/api-reference/aggregations/stddev.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,28 +22,31 @@ The name of the field in the output dataset that should store the result of this
aggregation. This field is expected to be of type `float`.
</Expandable>

<Expandable title="default" type="float">
<Expandable title="default" type="Optional[float]">
Standard deviation over an empty set of rows isn't well defined - Fennel
returns `default` in such cases.
returns `default` in such cases. If the default is not set or is None,
Fennel returns None and in that case, the expected type of `into_field`
must be `Optional[float]`.
</Expandable>

<pre snippet="api-reference/aggregations/stddev#basic" status="success"
message="Standard deviation in window of 1 day & week">
</pre>

#### Returns
<Expandable type="float">
<Expandable type="Union[float, Optional[float]]">
Stores the result of the aggregation in the appropriate field of the output
dataset. If there are no rows in the aggregation window, `default` is used.
</Expandable>


#### Errors
<Expandable title="Stddev on non int/float types">
The input column denoted by `of` must either be of `int` or `float` types.
The input column denoted by `of` must either be of `int` or `float` or
`decimal` types.

Note that unlike SQL, even aggregations over `Optional[int]` or `Optional[float]`
aren't allowed.
Note that like SQL, aggregations over `Optional[int]` or `Optional[float]`
are allowed.
</Expandable>

<Expandable title="Output and/or default aren't float">
Expand All @@ -52,7 +55,7 @@ The type of the field denoted by `into_field` in the output dataset and that of
</Expandable>

<pre snippet="api-reference/aggregations/stddev#incorrect_type" status="error"
message="Can not take stddev over string, only int or float">
message="Can not take stddev over string, only int or float or decimal">
</pre>

<pre snippet="api-reference/aggregations/stddev#non_matching_types" status="error"
Expand Down
9 changes: 5 additions & 4 deletions docs/pages/api-reference/aggregations/sum.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,18 +28,19 @@ type of the field in the input dataset corresponding to `of`.
</pre>

#### Returns
<Expandable type="Union[int, float]">
<Expandable type="Union[int, float, Decimal]">
Accumulates the count in the appropriate field of the output dataset. If there
are no rows to count, by default, it returns 0 (or 0.0 if `of` is float).
</Expandable>


#### Errors
<Expandable title="Sum on non int/float types">
The input column denoted by `of` must either be of `int` or `float` types.
The input column denoted by `of` must either be of `int` or `float`
or `decimal` types.

Note that unlike SQL, even aggregations over `Optional[int]` or `Optional[float]`
aren't allowed.
Note that like SQL, aggregations over `Optional[int]` or `Optional[float]`
are allowed.
</Expandable>

<pre snippet="api-reference/aggregations/sum#incorrect_type" status="error"
Expand Down
3 changes: 3 additions & 0 deletions fennel/CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,8 @@
# Changelog

## [1.5.59] - 2024-12-10
- Allow None as default value for min/max/avg/stddev aggregations.

## [1.5.58] - 2024-11-24
- Allow min/max aggregation on date, datetime and decimal dtypes

Expand Down
Loading

0 comments on commit 9150c77

Please sign in to comment.