-
Notifications
You must be signed in to change notification settings - Fork 13.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FLINK-3154][runtime] Upgrade from Kryo v2 + Chill 0.7.6 to Kryo v5 w… #22660
base: master
Are you sure you want to change the base?
Conversation
67f2abf
to
cda7fe1
Compare
I appreciate the work, but really this change has to go through a proper design process. There are open questions as to whether we really want to expose Kryo in the same way we did before (because the presence of Kryo serializers in the execution config is a problem in general). Please create a FLIP and publish it on the dev mailing list. |
@zentol ok, I created a Confluence login with username "kurto", however I don't seem to have permissions to create a new FLIP. I don't see a "Create" button as the docs say:
Could I get permissions to do this? Thank you :) |
You should have permissions by now |
da6e3ea
to
092d49b
Compare
a102ef9
to
f6d020a
Compare
d47bd58
to
501c6b9
Compare
d088f25
to
19a0a40
Compare
FYI, there is a pull request in the works to upgrade Twitter Chill to use Kryo 5.5.0. I don't think this pull request is blocked by the Chill pull request, but might be relevant, so I'm sharing the link. twitter/chill#747 |
…ith backward compatibility for existing savepoints and checkpoints.
I added my comment on the Jira ticket, but also posting it here for better visibility. I wanted to revisit the status of this ticket. There was some discussion in the mailing list about merging this into Flink 2.0. If the master branch now is targeting 2.0, maybe we can finally move forward with this FLIP? |
…ith backward compatibility for existing savepoints and checkpoints.
What is the purpose of the change
To upgrade the primary Kryo library used by Flink from v2.x to v5.x, while providing backwards compatibility with existing savepoints and checkpoints. This PR adds a new Kryo v5 dependency that is namespaced so that it can coexist with the legacy dependencies that would be kept for compatibility purposes. Flink also depends on the Twitter chill (Scala) and chill-java libraries for additions + enhancements to Kryo v2.x. This would also be deprecated as most functionality is included in Kryo v5.x. Some future version of Flink could eventually drop Kryo v2 and the Twitter chill dependencies when backwards compatibility with Kryo v2 based state is no longer needed.
Why upgrade Kryo? One reason is support for Java 17 and 21. The existing Kryo 2.x is not compatible with Java 17/21, while Kryo 5.x is. When running in a JDK 17 runtime, I notice that ArraysAsListSerializer from chill-java fails under Java 17. Fixes can be back ported for that relatively easily, but there are more issues. Kryo 2.x doesn't support Java records at all.
A more broader reason is Flink should be using the new, actively maintained version of Kryo rather than the 10+ year old 2.x branch that stopped getting any updates almost ten years ago. Kryo v2.x was released before the release of Java 8. So it's quite old. Kryo is a maintained project, there have been lots of improvements over the past ten years. Kryo 5.x has faster runtime performance, more memory efficient serialization, fixed lots of bugs, added functionality, and improved compatibility with newer versions of Java. Kryo 5.x will also get more improvements in the future that will be fully compatible with existing Kryo 5.x serialized data.
Brief change log
This is a large PR with a lot of surface area and risk. I tried to keep the scope of these changes as narrow and simple as possible. I copied all the Kryo v2 code to Kryo v5 equivalents and made necessary adjustments to get everything working. Some highlights as to what was done:
All existing serialization class names and package names are unmodified
Some Flink serialization code references serialization classes by full package name, so the package and class names of all existing serialization classes are unmodified.
Added new version
7
to KeyedBackendSerializationProxyIn production Flink, version 6 is the current version. This PR adds version 7. The only difference between 6 and 7 is the Kryo upgrade. With version 6 serialized data, all Kryo state is Kryo 2.x. With version 7 serialized data, all Kryo state is Kryo 5.x
Added
deserializeWithKeyedBackendVersion
toTypeSerializer<T>
By default, this just calls the regular
deserialize
method. The Kryo 5.x version of Flink KryoSerializer will check the version number. For versions older than 7, this will call the Kryo 2.x version of the Flink KryoSerializer class.Kryo 2.x Code Is not used outside of backwards compatibility scenarios
New serialized state will not be written with Kryo 2.x code. Some unit tests are still using the Kryo 2.x code, but the main code base is only using Kryo 2.x code for reading legacy state.
Verifying this change
This is passing the full test suite of automated tests in the CI system which covers lots of backwards compatibility scenarios.
Additionally, I wrote a Flink application to do a more thorough test of the Kryo upgrade that was difficult to convert into unit test form.
https://github.com/kurtostfeld/flink-kryo-upgrade-demo
If the Flink project is seriously considering accepting this PR, I plan to write more test scenarios for thorough backwards compatibility.
Does this pull request potentially affect one of the following parts:
@Public(Evolving)
: yes. Kryo v2 APIs are deprecated. Parallel Kryo v5 APIs are created with PublicEvolving