Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CALCITE-6465: Rework code generator to use Flink code splitter #3901

Open
wants to merge 30 commits into
base: main
Choose a base branch
from

Conversation

jduo
Copy link
Member

@jduo jduo commented Aug 2, 2024

No description provided.

@jduo
Copy link
Member Author

jduo commented Aug 2, 2024

This builds but introduces a dependency on flink-core. Looking for a way around this.

Integrate Flink Java code splitter and run it on all final Expressions when converting to Java code.

Next step is to run it based on configuration options
JavaCodeSplitter needs it for Preconditions
@jduo jduo marked this pull request as ready for review September 20, 2024 21:08
@@ -51,7 +51,7 @@ public Type getType() {
}

@Override public String toString() {
ExpressionWriter writer = new ExpressionWriter(true);
ExpressionWriter writer = new ExpressionWriter(true, true);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally this would read from the CalciteSystemProperty but calcite-core depends on linq4j so it can't be looked up here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this have any practical impact on the actual code generation? (like splitting in some circumstances where the CalciteSystemProperty has not been set to true). If not, I'd say this is acceptable.
It's true that it is a bit unfortunate that we cannot check the CalciteSystemProperty here, but I can't think of a better solution at this point.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally this would read from the CalciteSystemProperty but calcite-core depends on linq4j so it can't be looked up here.

I do not know this code, but wonder if the split configuration values be passed by value into the linq4j layer. If possible , this would avoid the dependancy recursion.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this toString() method is used in code generation, just logging/debugging.
This option is passed in to Expressions static functions, which is really where ExpressionWriters are instantiated most of the time.

@@ -57,20 +60,39 @@ public MethodDeclaration(int modifier, String name, Type resultType,
}

@Override public void accept(ExpressionWriter writer) {
String modifiers = Modifier.toString(modifier);
writer.append(modifiers);
final ExpressionWriter writerForUnsplitMethod = writer.usesMethodSplitting()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have the impression the intent of this line is not very obvious, perhaps a comment would help to clarify it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in dc1c66d

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jduo , just a small comment, because I find the comment a bit confusing (maybe I'm getting it wrong), where it says:

... serialize the method declaration directly to the supplied writer if method splitting is enabled or ...

shouldn't it say "disabled" ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that's correct. I've updated this comment, but note that it is now in ClassDeclaration instead of MethodDeclaration. The method splitter requires an enclosing class to write some of its output too (it can extract variable names to instance variables and move blocks of code to generated private methods to reduce the size of a method).

- Add comments about method splitting
- Add overloads with default values
- Clarify condition about using a temporary writer
Method splitting has to happen at the class level so that there is a place to
write fields and methods generated by reducing the size of a method
Change the property ENABLE_METHOD_SPLITTING to METHOD_SPLITTING_THRESHOLD.
It is now an integer specifying the method length limit before splitting instead
of a boolean to turn on a fixed limit.
A value of zero indicates that method splitting is off.

Change tests unrelated to method splitting to set method splitting off.
* Use this setting to automatically detect and split methods larger than the specified
* limit.
*/
public static final CalciteSystemProperty<Integer> METHOD_SPLITTING_THRESHOLD =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: proposal (to make the new property robust in case of negative values)

public static final CalciteSystemProperty<Integer> METHOD_SPLITTING_THRESHOLD =
      intProperty("calcite.linq.method_splitting_threshold", 0, v -> v >= 0);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since 0 is a valid method length, 0 doesn't seem to be a good value to indicate "don't split".

I think that any negative value should mean "don't split", and the default should be -1.

@@ -111,7 +111,10 @@ public static Bindable toBindable(Map<String, Object> parameters,
parameters);

final ClassDeclaration expr = relImplementor.implementRoot(rel, prefer);
String s = Expressions.toString(expr.memberDeclarations, "\n", false);
String s =
Expressions.toString(expr.memberDeclarations,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

put "\n", false, on previous line

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

* Use this setting to automatically detect and split methods larger than the specified
* limit.
*/
public static final CalciteSystemProperty<Integer> METHOD_SPLITTING_THRESHOLD =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since 0 is a valid method length, 0 doesn't seem to be a good value to indicate "don't split".

I think that any negative value should mean "don't split", and the default should be -1.

@@ -420,6 +420,18 @@ public final class CalciteSystemProperty<T> {
public static final CalciteSystemProperty<Integer> FUNCTION_LEVEL_CACHE_MAX_SIZE =
intProperty("calcite.function.cache.maxSize", 0, v -> v >= 0);

/**
* The length of code before automatic method splitting is enabled
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name/description of the property isn't great. It doesn't include the units (characters, lines, number of statements). The javadoc doesn't say how the threshold is used - whether splitting occurs if a method is greater than or greater than or equal .

Maybe a better name can be found, I don't know.

But the doc should say something like 'A method is split if its size in characters is greater than or equal to the threshold and the threshold is non-negative.'

The field in ExpressionWriter needs better doc.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've renamed this to be MAX_METHOD_LENGTH_IN_CHARS_BEFORE_SPLITTING and clarified that splitting starts happening if the code is longer than the limit and not when equal to the limit. This is a long name though so I'm open to better names.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the topic of the constant to indicate no limit, I've changed this to be -1 instead of 0. I don't see a good place to hold the magic number though -- CalciteSystemProperty would be a reasonable user-facing location, but ExpressionWriter can't reference it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made the -1 magic number a constant in Expressions

@@ -51,7 +51,8 @@ public Type getType() {
}

@Override public String toString() {
ExpressionWriter writer = new ExpressionWriter(true);
// Use the JVM limit for method size (4K) as the method splitting threshold.
ExpressionWriter writer = new ExpressionWriter(true, 4000);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand that you can't use the system property. But why should the default behavior be to split? I think it should be to not split.

If people want to split they should invoke by other means than toString().

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -1719,6 +1724,60 @@ public void checkBlockBuilder(boolean optimizing, String expected) {
Expressions.toString(Expressions.constant(set)));
}

@Test void testLargeMethod() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test needs good javadoc.

What is MyClass and why is it commented out?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clarified the meaning of the MyClass comment block and added to the header public javadoc about the test + JIRA.

@@ -111,6 +111,7 @@ dependencies {
apiv("org.apache.commons:commons-pool2")
apiv("org.apache.commons:commons-collections4")
apiv("org.apache.commons:commons-text")
apiv("org.apache.flink:flink-table-code-splitter")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a note above this list of declarations that we import flink-table-code-splitter but we must not import core flink modules.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

System.out.println("Innodb: " + Expressions.toString(enumerable));
System.out.println(
"Innodb: " + Expressions.toString(enumerable,
CalciteSystemProperty.METHOD_SPLITTING_THRESHOLD.value()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indentation

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

}

ExpressionWriter(boolean generics) {
ExpressionWriter(boolean generics, int methodSplittingThreshold) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need javadoc for this constructor.

You don't say the units of the threshold (bytes, lines), that it applies to source code length, what values disable splitting, and whether it's > or >=.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Expanded comments and renamed this parameter.

this.methodSplittingThreshold = methodSplittingThreshold;
}

public ExpressionWriter duplicateState() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

public method needs javadoc

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made this package-private rather than adding javadoc -- it really seems like most of the methods in ExpressionWriter can be package-private.


/**
* Converts an expression to Java source code.
* Optionally splits the generated method if the method is too large.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"too large" is vague. can do better.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed this to be more specific about relating to the limit.

Copy link

sonarcloud bot commented Oct 4, 2024

@jduo
Copy link
Member Author

jduo commented Oct 9, 2024

@rubenada @julianhyde , just wondering if there was some more feedback on this and would like some clarification on where system tests reside within Calcite. Thanks.

@rubenada
Copy link
Contributor

rubenada commented Oct 16, 2024

@jduo , not sure what you mean by "system tests". If you want to add a test with a large dynamic code that fails when this feature is turn off / passes when it is turned on; maybe you could check and adapt the test that was done for the slightly related CALCITE-3094 (and then refined in CALCITE-6593 ): LargeGeneratedJoinTest. The query in that test, with the JOIN_SELECTOR_COMPACT_CODE_THRESHOLD disabled, should throw the "code grows beyond 64 KB InternalCompilerException". At that point, turning on the code splitter from this PR, should make the test pass.

@@ -111,7 +111,9 @@ public static Bindable toBindable(Map<String, Object> parameters,
parameters);

final ClassDeclaration expr = relImplementor.implementRoot(rel, prefer);
String s = Expressions.toString(expr.memberDeclarations, "\n", false);
String s =
Expressions.toString(expr.memberDeclarations, "\n", false,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that code split is not effective here. This is because the code splitting logic in this pr is implemented in ClassDeclaration, and the expr.memberDeclarations here is usually MethodDeclaration.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@asdfgh19 , thanks for your comments. Would you be able to add tests for your scenarios to the PR?

For this one, it seems like we a can pass expr to Expressions.toString(), but I'm not sure what effect that has on the calls that depend on s ( Hook.JAVA_PLAN.run(s);)

@@ -186,7 +186,9 @@ static Scalar.Producer baz(ParameterExpression context_,
final ClassDeclaration classDeclaration =
Expressions.classDecl(Modifier.PUBLIC, "Buzz", null,
ImmutableList.of(Scalar.Producer.class), declarations);
String s = Expressions.toString(declarations, "\n", false);
String s =
Expressions.toString(declarations, "\n", false,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The declarations here usually do not look like ClassDeclarations either.

@@ -100,7 +100,9 @@ private static String compile(RexBuilder rexBuilder, List<RexNode> constExps,
Expressions.methodDecl(Modifier.PUBLIC, Object[].class,
BuiltInMethod.FUNCTION1_APPLY.method.getName(),
ImmutableList.of(root0_), blockBuilder.toBlock());
String code = Expressions.toString(methodDecl);
String code =
Expressions.toString(methodDecl,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The methodDecl here is a MethodDeclaration, not a ClassDeclaration, so it seems that the code splitting logic implemented in the ClassDeclaration does not work here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that this code relates to RexExecutable. The method body is generated here and written to String, then gets attached as the body of a class in RexExecutable that's run directly in Janino's IClassBodyEvaluator. This will be quite complicated to rework.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants