Polish "rough edges" of type engine #4544
Replies: 5 comments 3 replies
-
Linking #1337 and #4358 as these issues certainly fall under the "rough edges" discussed here. |
Beta Was this translation helpful? Give feedback.
-
Currently, the right type transformer is determined as follows (see here): class TypeEngine(...):
def get_transformer(cls, python_type: Type) -> TypeTransformer[T]:
"""
The TypeEngine hierarchy for flyteKit. This method looksup and selects the type transformer. The algorithm is
as follows
d = dictionary of registered transformers, where is a python `type`
v = lookup type
Step 1:
If the type is annotated with a TypeTransformer instance, use that.
Step 2:
find a transformer that matches v exactly
Step 3:
find a transformer that matches the generic type of v. e.g List[int], Dict[str, int] etc
Step 4:
Walk the inheritance hierarchy of v and find a transformer that matches the first base class.
This is potentially non-deterministic - will depend on the registration pattern.
Special case:
If v inherits from Enum, use the Enum transformer even if Enum is not the first base class.
TODO lets make this deterministic by using an ordered dict
Step 5:
if v is of type data class, use the dataclass transformer
""" We could add another step between steps 1 and 2 that checks whether In one of the recent contributors' syncs, it was proposed to transport these objects "without a type" as pickle. |
Beta Was this translation helpful? Give feedback.
-
@wild-endeavor, @kumare3 for comment Variants
Concerns
|
Beta Was this translation helpful? Give feedback.
-
Hey @fg91, in the first example, we expect users to capture the output like this: I do not understand the distinction between tuple[str] & typing.Tuple[str], somebody with better Python understand should weigh in, @wild-endeavor? How is the type for these dict is declared? -- |
Beta Was this translation helpful? Give feedback.
-
For visibility:
Because both of these topics are now being discussed in formal RFCs, I close this discussion here. |
Beta Was this translation helpful? Give feedback.
-
Thanks to the Flyte type system/flytekit type engine, users can seamlessly pass along python objects between tasks without having to take care of serialization themselves as is the case in most other workflow orchestration engines.
This works so well that many users don't even realize all the logic that is happening below the surface, often, they are forgetting that they aren't writing Python.
Having run a Flyte deployment for ML teams at two companies, two "rough edges" of the type system have surprised/frustrated users and raised questions more times than I can recount:
There is no support for Python tuples in the type engine:
Gives
RestrictedTypeError: Transformer for type <class 'tuple'> is restricted currently
Due to protobuf limitations, ints in untyped dictionaries are turned into floats:
Logs
{'foo': 1.0}
While this might be an unfair assessment given how much heavy lifting the type engine does for users, I have repeatedly heard frustration such as "Why the h*** is my batch size all of a sudden a float???" or "wait, flyte doesn't even support tuples?"
Somehow there appears to be little understanding that "such a basic python thing isn't supported".
Here in slack, we discussed this issue before and the underlying reasons are limitations in protobuf and the difficulty to distinguish tuples from named tuples.
Given how often I personally have been approached about these two points, I would like to discuss ways to work around the underlying technical limitations so that these issues can be hidden from users.
Beta Was this translation helpful? Give feedback.
All reactions