-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: Generate docs from macros. #12822
base: main
Are you sure you want to change the base?
Conversation
@@ -48,6 +48,8 @@ members = [ | |||
"datafusion-examples", | |||
"test-utils", | |||
"benchmarks", | |||
"datafusion/macros", | |||
"datafusion/pre-macros" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pre macros needed to have Documentation structure already precreated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we go with this approach, maybe we can put the doc structure in a crate named specifically such as datafusion/documentation
rather than something generic like pre-macro
s
datafusion/functions/src/math/log.rs
Outdated
@@ -37,6 +38,7 @@ use datafusion_expr::{ | |||
}; | |||
use datafusion_expr::{ScalarUDFImpl, Signature, Volatility}; | |||
|
|||
#[udf_doc(description = "log_description", example = "log_example")] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here we put the documentation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks quite cool ❤️
One thing I would be interested in seeing is if/how a multi line description and example would look. I have had trouble with multi-line macro / docs in the past
datafusion/functions/src/math/log.rs
Outdated
@@ -472,4 +471,16 @@ mod tests { | |||
SortProperties::Unordered | |||
); | |||
} | |||
|
|||
#[test] | |||
fn test_doc() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and the test shows it applied
WDYT should we move in this direction? |
I find this intriguing and technically very interesting! I do have a few questions:
To me in general while I find this quite interesting I wonder if the benefit outweighs the added complexity of the code generation. To be clear I am absolutely not against this approach - I just have a few concerns. |
Thanks @Omega359
I'm planning to play with to_date function as it covers also multiline comments as @alamb mentioned |
@alamb @Omega359 please have a look on real example Btw multiline seems to be working |
// under the License. | ||
|
||
#[derive(Debug, Clone)] | ||
pub struct DocumentationTest { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I copied all Documentation structures here, added Test so as not to break everything
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this looks great personally
As long as the macro has enough documentation itself, I think this would help reduce a bunch of the boiler plate (though perhaps not all)
@@ -48,6 +48,8 @@ members = [ | |||
"datafusion-examples", | |||
"test-utils", | |||
"benchmarks", | |||
"datafusion/macros", | |||
"datafusion/pre-macros" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we go with this approach, maybe we can put the doc structure in a crate named specifically such as datafusion/documentation
rather than something generic like pre-macro
s
use std::any::Any; | ||
use std::sync::OnceLock; | ||
|
||
#[udf_doc( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is very cool
@Omega359 what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dumped the example #[udf_doc ...] into my editor to do a direct comparison to the existing solution and to be honest it definitely is nicer to visually parse.
There are situations currently where this approach will not work, at least as it currently seems to be implemented. For example: bit_and_or_xor, some of the regex udfs, etc.
|
||
[package] | ||
name = "datafusion-macros" | ||
description = "Proc macros for DataFusion query engine" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
description = "Proc macros for DataFusion query engine" | |
description = "Procedural macros for DataFusion query engine" |
I can't think of a reason why not off the top of my head. I'm hoping to have time to look over your latest updates in this PR later tonight |
use std::any::Any; | ||
use std::sync::OnceLock; | ||
|
||
#[udf_doc( | ||
doc_section(include = "true", label = "Time and Date Functions"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I worry about this since it's almost certainly going to drift at some point in the future via typos, etc. Is it possible to reference the const's in scalar_doc_sections/window_doc_sections/aggregate_doc_sections ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if enum/constant can be used directly within the attribute, however there is a possibility to check the value in macro. So if the value is not satisfied to existing enum variants the macro can throw an exception with explaining what section names are supported
+---------------------------------------------------------------+\n\ | ||
```\n\n\ | ||
Additional examples can be found [here](https://github.com/apache/datafusion/blob/main/datafusion-examples/examples/to_date.rs)", | ||
standard_argument(name = "expression", expression_type = "String"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is likely better as expression_type
-> prefix
let mut syntax_example: Option<LitStr> = None; | ||
let mut sql_example: Option<LitStr> = None; | ||
let mut standard_args: Vec<(Option<LitStr>, Option<LitStr>)> = vec![]; | ||
let mut udf_args: Vec<(Option<LitStr>, Option<LitStr>)> = vec![]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
might be missing related udfs here.
Thanks @Omega359 there is bunch of things needed to be added, its not a final PR. I'll add some minor missing pieces and the PR will be ready for the review, most likely next week. Because I need to make a cleanup, remove test structures, polish crates and everything. |
I'll get back on it little later as need to finish some more urgent work with joins |
Which issue does this PR close?
Closes #.
Rationale for this change
What changes are included in this PR?
Are these changes tested?
Are there any user-facing changes?