Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FLINK-36047] Add CompiledPlan annotations to BatchExecOverAggregate #25635

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

jnh5y
Copy link
Contributor

@jnh5y jnh5y commented Nov 8, 2024

What is the purpose of the change

  • Adds Compiled Plan annotations to BatchExecOverAggregate.
  • Tests the new annotations with the existing TestPrograms.
  • Moves the existing OverAggregateTestPrograms and makes it clear which ones have out-of-order data.

Verifying this change

This change adds a BatchRestoreTest to cover the new annotations and show that the batch compiled plan can be restored and executed correctly.

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): (no)
  • The public API, i.e., is any changed class annotated with @Public(Evolving): (no)
  • The serializers: (yes)
  • The runtime per-record code paths (performance sensitive): (no)
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (no)
  • The S3 file system connector: (no)

Documentation

  • Does this pull request introduce a new feature? (no)
  • If yes, how is the feature documented? (not applicable)

@flinkbot
Copy link
Collaborator

flinkbot commented Nov 8, 2024

CI report:

Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run azure re-run the last Azure build

* Moves the existing tests to the common folder and updates names to indicate that the tests contain out of order data.
@jnh5y
Copy link
Contributor Author

jnh5y commented Nov 12, 2024

@flinkbot run azure

@snuyanzin
Copy link
Contributor

LGTM
will merge it later today

Row.of(50L, 5L, 5, "Hello"),
Row.of(60L, 6L, 6, "Hello"),
Row.of(65L, 6L, 65, "Hello"),
Row.of(51L, 19L, 15, "Hello"), // Late?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't get these comments here. Isn't late and out of order the same thing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comments are in the existing source file: https://github.com/apache/flink/pull/25635/files#diff-f33715f2b44a0a40a24ffba687e4848ea142d978f67492ec85f1be64d853e216L44

If I recall, I believe I was hitting a situation where the streaming OverAggregate operator was doing some unexpected things relative cleaning up keys. I got side tracked from filing a bug for that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is an existing comment, I'd like to suggest we clean it up separately from this PR.

Not having it merged is blocking other PRs. :(

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fine we can merge it. But it would be great if you can explain code that we have written. It doesn't make sense to let batch mode tests run on the same incorrect behavior.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The OverAggregateBatchRestoreTests do not use any of the test programs with data which would be dropped by the streaming engine.

To achieve this, this PR basically labels the 4 existing test programs as having out of order data, and creates a copy of each without out-of-order/dropped data.

At that point, the only (known) difference between streaming and batch should be FLINK-25082 which is noted in OverAggregateBatchRestoreTest.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants