-
Notifications
You must be signed in to change notification settings - Fork 291
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Restrict Python Version Mismatch between Pickled Object and Remote Envrionment #2848
Restrict Python Version Mismatch between Pickled Object and Remote Envrionment #2848
Conversation
Signed-off-by: Mecoli1219 <[email protected]>
cc @thomasjpfan |
Signed-off-by: Mecoli1219 <[email protected]>
Signed-off-by: Mecoli1219 <[email protected]>
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #2848 +/- ##
==========================================
- Coverage 40.56% 38.49% -2.07%
==========================================
Files 196 199 +3
Lines 20526 20781 +255
Branches 2642 2671 +29
==========================================
- Hits 8326 8000 -326
- Misses 12096 12567 +471
- Partials 104 214 +110 ☔ View full report in Codecov by Sentry. |
Signed-off-by: Mecoli1219 <[email protected]>
Signed-off-by: Mecoli1219 <[email protected]>
Signed-off-by: Mecoli1219 <[email protected]>
flytekit/remote/remote.py
Outdated
queue: typing.List[typing.Union[WorkflowBase, PythonTask, CoreNode]] = [root_entity] | ||
pickled_target_dict = {} | ||
pickled_target_dict = { | ||
"metadata": { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but this shares a namespace with task names right? so if someone makes a task where entity.name
is "metadata" then this metadata will be lost and when we go to look for it we'll get a KeyError or something.
I don't think it needs to be here though, this is a pretty standard thing for all tasks. There's a couple places I think we can fit this information.
- RuntimeMetadata's
flavor
field. I don't think this is being used right now. The name "runtime" is not correct. It's more like "compile-time", but that's a different problem. - Something in these tags. To my knowledge these tags aren't really being used right now.
I think i have a slight preference for the first one. what do you think @eapolinario?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure the first approach will be enough though. It is possible that we have to extend it to more fields (like maybe the package version?) in the future, and the first approach will have only 1 string that can store this value.
I think that this metadata is primarily used for interactive mode since only pickling objects will potentially cause the version mismatch problem. When creating a pickled object, the versions should follow with this object. I am thinking of moving the entities dictionary one layer down, so that user could have define function with name metadata
{
"metadata": {
"python_version": "3.12",
...
},
"entity_dict": {
"task1": ...
...
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
your call. I like the idea of adding the entity_dict
key in general. I think we should do that anyways.
Your call on the metadata. The issue with just adding fields is that this risks becoming an undocumented API, so I was thinking we'd just use the existing flavor until we needed more than one field. If you go the pickled dict route though, just make sure it's well documented/commented.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see your concern. How about creating a dataclass to define the pickled object? In that way, people could know what is inside this object.
Regarding the metadata, the pickling only affects the Python side and doesn't need to be stored in the backend. The main issue this PR addresses is the Python version mismatch between pickled files and remote execution. By sending the Python version inside the pickled file, we can ensure the correctness of the version. Also as far as I researched, I think that runtimeMetadata
is currently used for flytekit's language and version information rather than the Python version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds good! just make sure using dataclass doesn't break the pickling.
Signed-off-by: Mecoli1219 <[email protected]>
@wild-endeavor I push the |
Signed-off-by: Mecoli1219 <[email protected]>
Tracking Issues
flyteorg/flyte#5907
Why are the changes needed?
In the Jupyter Notebook environment, we are using
cloudpickle
to package the entity at the binary level. When we load the pickled object with a different Python version, it will throw an unexpected error. Therefore, we should restrict loading pickled objects with the mismatched Python version.What changes were proposed in this pull request?
Let's assume the Python version for the pickled object is X.
How was this patch tested?
I set the Python version of Jupyter Notebook to 3.11 and built 3 images with different Python versions for remote execution, including 3.9, 3.11, and 3.12.
Setup process
Screenshots
Check all the applicable boxes
Related PRs
Docs link