-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Convert ntile
builtIn function to UDWF
#13040
Conversation
/// Create a new `ntile` function | ||
pub fn new() -> Self { | ||
Self { | ||
signature: Signature::any(0, Volatility::Immutable), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ntile(expression)
window function takes a single argument of the integer type.
signature: Signature::any(0, Volatility::Immutable), | |
signature: Signature::uniform( | |
1, | |
vec![ | |
DataType::UInt64, | |
DataType::UInt32, | |
DataType::UInt16, | |
DataType::UInt8, | |
DataType::Int64, | |
DataType::Int32, | |
DataType::Int16, | |
DataType::Int8, | |
], | |
Volatility::Immutable, | |
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just noticed this is a draft PR 😅. Let me know once it's ready for review.
let nullable = false; | ||
let return_type: &DataType = field_args.input_types().first().unwrap(); | ||
|
||
Ok(Field::new(self.name(), return_type.clone(), nullable)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When constructing the result field (column) we need to pass the fully qualified name in the schema which is available in WindowUDFFieldArgs
.
datafusion/datafusion/expr/src/udwf.rs
Lines 387 to 390 in 818ce3f
/// Call `field_args.name()` to get the fully qualified name for defining | |
/// the [`Field`]. For a complete example see the implementation in the | |
/// [Basic Example](WindowUDFImpl#basic-example) section. | |
fn field(&self, field_args: WindowUDFFieldArgs) -> Result<Field>; |
It is easy to get confused with the name of the udwf 😅.
External error: query failed: DataFusion error: Internal error: Input field name ntile does not match with the projection expression ntile(Int64(8)) ORDER BY [aggregate_test_100.c4 ASC NULLS LAST] RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW. This was likely caused by a bug in DataFusion's code and we would welcome that you file an bug report in our issue tracker [SQL] SELECT NTILE(8) OVER (ORDER BY C4) as ntile1, NTILE(12) OVER (ORDER BY C12 DESC) as ntile2 FROM aggregate_test_100 ORDER BY c7 LIMIT 5 at test_files/window.slt:743
Ok(Field::new(self.name(), return_type.clone(), nullable)) | |
Ok(Field::new(field_args.name(), return_type.clone(), nullable)) |
should i remove this duplicate code from : datafusion/physical-plan/src/windows/mod.rs:
|
Yes, please. |
// KIND, either express or implied. See the License for the | ||
// specific language governing permissions and limitations | ||
// under the License. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a inner doc comment:
//! `ntile` window function implementation | |
pub fn ntile(arg: i64) -> datafusion_expr::Expr { | ||
ntile_udwf().call(vec![arg.lit()]) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The expression API changed here. The arg
type is Expr
in the existing API.
https://docs.rs/datafusion/latest/datafusion/logical_expr/window_function/fn.ntile.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Revert to existing API contract
- Add a logical plan roundtrip test
Marked as completed.
fn expressions(&self, expr_args: ExpressionArgs) -> Vec<Arc<dyn PhysicalExpr>> { | ||
parse_expr(expr_args.input_exprs(), expr_args.input_types()) | ||
.into_iter() | ||
.collect::<Vec<_>>() | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i took the reference from to be consistent while taking input
lead_lag.rs
file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can remove this completely and rely on the default implementation for ntile
. From what I see ntile
doesn't depend on the input expression in partition evaluator.
This implementation is required in lead/lag
because it is possible that it can be called with NULL
(input expression), and we try to deduce the type for that NULL
if a default value argument exists. For lead/lag
the types of the input expression and the default value needs to be the same in partition evaluator.
let n = get_unsigned_integer(scalar_n)?; | ||
if n == 0 { | ||
return exec_err!("NTILE requires a positive integer"); | ||
} | ||
|
||
Ok(Box::new(NtileEvaluator { n })) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's keep the original code from create_built_in_window_expr
for parsing n
unmodified?
let n = get_unsigned_integer(scalar_n)?; | |
if n == 0 { | |
return exec_err!("NTILE requires a positive integer"); | |
} | |
Ok(Box::new(NtileEvaluator { n })) | |
if scalar_n.is_unsigned() { | |
let n = get_unsigned_integer(scalar_n)?; | |
Ok(Box::new(NtileEvaluator { n })) | |
} else { | |
let n: i64 = get_signed_integer(scalar_n)?; | |
if n <= 0 { | |
return exec_err!("NTILE requires a positive integer"); | |
} | |
Ok(Box::new(NtileEvaluator { n: n as u64 })) | |
} |
fn documentation(&self) -> Option<&Documentation> { | ||
Some(get_ntile_doc()) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🫶
@@ -51,3 +51,18 @@ pub(crate) fn get_scalar_value_from_args( | |||
None | |||
}) | |||
} | |||
|
|||
pub(crate) fn get_unsigned_integer(value: ScalarValue) -> Result<u64> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it is better to extract this function inside ntile
.
@@ -165,25 +163,6 @@ fn window_expr_from_aggregate_expr( | |||
} | |||
} | |||
|
|||
fn get_scalar_value_from_args( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
## Ranking functions | ||
|
||
- [rank](#rank) | ||
- [dense_rank](#dense_rank) | ||
- [ntile](#ntile) | ||
|
||
### `rank` | ||
|
||
Rank of the current row with gaps; same as row_number of its first peer. | ||
|
||
```sql | ||
rank() | ||
``` | ||
|
||
### `dense_rank` | ||
|
||
Rank of the current row without gaps; this function counts peer groups. | ||
|
||
```sql | ||
dense_rank() | ||
``` | ||
|
||
### `ntile` | ||
|
||
Integer ranging from 1 to the argument value, dividing the partition as equally as possible. | ||
|
||
```sql | ||
ntile(expression) | ||
``` | ||
|
||
#### Arguments | ||
|
||
- **expression**: An integer describing the number groups the partition should be split into | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These docs should not be removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i have moved them to docs/source/user-guide/sql/window_functions_new.md
Should I still keep them here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes sense to me now. You are removing the {rank, dense_rank}
functions here because they were already migrated to new documentation.
Let this be 👍
@@ -179,6 +180,18 @@ Returns the rank of the current row without gaps. This function ranks rows in a | |||
dense_rank() | |||
``` | |||
|
|||
### `ntile` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚀
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
@jatin510 Can you please mark this PR as draft so that it is not accidentally merged into main while you are working on changes. You can change it back when you are ready, so the committers will know it is ready to merge. |
@@ -940,6 +940,7 @@ async fn roundtrip_expr_api() -> Result<()> { | |||
vec![lit(1), lit(2), lit(3)], | |||
vec![lit(10), lit(20), lit(30)], | |||
), | |||
cume_dist(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
pub fn ntile(arg: Expr) -> Expr { | ||
ntile_udwf().call(vec![arg]) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
@jatin510 Thanks 🙌. The udwf epic is almost complete! |
@jcsherin Oops ! Should I delete this remote branch and update the branch name and create a PR again ! |
It is alright, that is not a problem at all 😂. You just need to edit the The |
Thank you so much ! @jcsherin |
Is this PR ready to go? It looks so to me. Thank you @jatin510 and @jcsherin and @jayzhan211 |
@alamb This PR is ready. |
Thanks again @jatin510 for this PR, @jcsherin for the guidance and @jayzhan211 for the review 🚀 |
Which issue does this PR close?
Closes #12694.
Rationale for this change
Same as described in #8709.
What changes are included in this PR?
Converts BuiltinWindowFunction::Ntile to user-defined window function.
Export a fluent-style API for creating ntile expressions.
Are these changes tested?
Yes
Are there any user-facing changes?