Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Importing 3rd party package ruamel.yaml anywhere causes failure to validate workflow #638

Open
adamh-oai opened this issue Sep 8, 2024 · 12 comments
Labels
bug Something isn't working

Comments

@adamh-oai
Copy link

To reproduce:

  1. Checkout https://github.com/temporalio/samples-python
  2. poetry add ruamel.yaml
  3. Add import ruamel.yaml to the top of hello/hello_activity.py
  4. poetry run hello/hello_activity.py

This produces the error here: https://gist.github.com/adamh-oai/dfdb9b07b89bf3cee10da34ba2582805

Important parts:

  File "/Users/adamh/.virtualenvs/temporalio-samples-G-Ux_4V2-py3.11/lib/python3.11/site-packages/ruamel/yaml/representer.py", line 1132, in <module>
    RoundTripRepresenter.add_representer(TimeStamp, RoundTripRepresenter.represent_datetime)
  File "/Users/adamh/.virtualenvs/temporalio-samples-G-Ux_4V2-py3.11/lib/python3.11/site-packages/ruamel/yaml/representer.py", line 135, in add_representer
    cls.yaml_representers[data_type] = representer
    ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^
TypeError: unhashable type: '_RestrictedProxy'

The above exception was the direct cause of the following exception:
...
  File "/Users/adamh/.virtualenvs/temporalio-samples-G-Ux_4V2-py3.11/lib/python3.11/site-packages/temporalio/worker/_workflow.py", line 118, in __init__
    raise RuntimeError(f"Failed validating workflow {defn.name}") from err
RuntimeError: Failed validating workflow GreetingWorkflow

This behavior is surprising to me because the workflow/activity is not actually using the ruamel package. The error persists if I move the workflow/activity to a separate module and wrap in a imports_passed_through block. Wrapping the ruamel import in this same block does resolve the error. It seems like the ruamel import may have side effects on other modules. So, I have a workaround, but (short of disabling sandboxing entirely) I'm concerned that in a large codebase with unpredictable 3rd party libraries this will be a recurring issue.

@adamh-oai adamh-oai added the bug Something isn't working label Sep 8, 2024
@cretz
Copy link
Member

cretz commented Sep 9, 2024

This is a sandbox issue. Despite those short "hello" samples being in the same file, we actually recommend workflows be in their own file. Can you replicate if the workflows are in a separate file and import the activities via "passthrough" similar to what's shown in the quickstart of the README: https://github.com/temporalio/sdk-python?tab=readme-ov-file#implementing-a-workflow?

@adamh-oai
Copy link
Author

Yes, it happens if the workflows are in their own module and imported via an unsafe block:

The error persists if I move the workflow/activity to a separate module and wrap in a imports_passed_through block. Wrapping the ruamel import in this same block does resolve the error. It seems like the ruamel import may have side effects on other modules. So, I have a workaround, but (short of disabling sandboxing entirely) I'm concerned that in a large codebase with unpredictable 3rd party libraries this will be a recurring issue.

@adamh-oai
Copy link
Author

A related issue: #301

@cretz
Copy link
Member

cretz commented Sep 9, 2024

Yes, it happens if the workflows are in their own module and imported via an unsafe block:

Ah, I see the bug already opened for making the proxy item hashable. I wonder then if this is a duplicate of that. In this case our wrapping of datetime causes this issue.

Wrapping the ruamel import in this same block does resolve the error. It seems like the ruamel import may have side effects on other modules.

To confirm, are you using this library inside the workflow and want the import reloaded every time the workflow is run? We recommend passing through all imports you know you'll use deterministically and/or you don't want completely reloaded every time. It's going to be a performance issue to reload third party libraries for every workflow run, but I understand the safety benefit.

I'm concerned that in a large codebase with unpredictable 3rd party libraries this will be a recurring issue.

Yes, most do not use many third party libraries inside their workflows, mostly only in activities and other code. But yes, our goal is to try our best to make third party libraries usable inside workflows but these hiccups happen here and there and we need to fix the linked bug.

@adamh-oai
Copy link
Author

To confirm, are you using this library inside the workflow and want the import reloaded every time the workflow is run?

This import is not used inside the workflow; in the real case where I hit this I was using it to load a yaml file for configuring the temporal client, but the workflows themselves don't import it or use it.

@cretz
Copy link
Member

cretz commented Sep 9, 2024

but the workflows themselves don't import it or use it.

This error only occurs in workflow code when loading it in a sandbox, so somehow this yaml code is getting executed/loaded in a workflow. Maybe the activity imports weren't being passed through when you split the workflow file off onto its own?

@creatorrr
Copy link

Same issue. I ended up removing ruamel entirely but I found that adding the following to the very first __init__.py file seems to help:

from temporalio import workflow

with workflow.unsafe.imports_passed_through():
    import ruamel.yaml

    # also- msgpack if installed
    import msgpack

@creatorrr
Copy link

@cretz it would be really helpful to add this suggestion / explanation to the error though. It took up hours to track this down and fix :(

@creatorrr
Copy link

or maybe a whitelist of common libraries in the temporal python sdk itself? that's probably not super wise but would save so much confusion

@cretz
Copy link
Member

cretz commented Oct 16, 2024

We don't really control the error here unfortunately. We do have docs at https://github.com/temporalio/sdk-python?tab=readme-ov-file#workflow-sandbox that say in bold:

For performance and behavior reasons, users are encouraged to pass through all third party modules whose calls will be deterministic

@adamh-oai
Copy link
Author

This error only occurs in workflow code when loading it in a sandbox, so somehow this yaml code is getting executed/loaded in a workflow.

The way I interpreted it is that because ruamel patches datetime, it's implicitly used in the the workflow even though it's not imported. FWIW, I can't repro this right now but it was weirdly inconsistent earlier as well so not sure what's going on; since someone else ran into it I assume this is a real issue. Ideally ruamel would just Not Do That, but I'm also not sure why it does or what other options there are.

The main issue is that an innocuous change in another part of the codebase that doesn't touch anything temporal related can cause temporal sandbox failures.

@cretz
Copy link
Member

cretz commented Oct 16, 2024

because ruamel patches datetime

Ah, I was not aware of this (unfamiliar with the library). So we proxy datetime to avoid people doing things like calling now(). But that proxying can be disabled. We have to do this for Pydantic because of an issue in Python subclass checking. So you can create a new sandbox runner without datetime proxied like https://github.com/temporalio/samples-python/blob/171b5e5b205167fdff4231978857c4efe1cd6225/pydantic_converter/worker.py#L45-L65 and pass that runner to the worker and datetime will not be affected usually.

However, you may have to use with temporalio.workflow.unsafe.sandbox_unrestricted(): for any ruamel-using code (even if it uses it indirectly, like with datetime) to avoid our other checks.

The main issue is that an innocuous change in another part of the codebase that doesn't touch anything temporal related can cause temporal sandbox failures.

I am not sure any Python library can help this if patching is occurring. If you patch Python stdlib and some unrelated code expects it to work the stdlib way, it may fail. But between passing through imports, unrestricting sandbox for certain code, and disabling proxying for the datetime module, you should be able to remove all Temporal checks and wrappings.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants