Polish "rough edges" of type engine #4544

fg91 · 2023-12-07T12:40:20Z

fg91
Dec 7, 2023
Collaborator

Thanks to the Flyte type system/flytekit type engine, users can seamlessly pass along python objects between tasks without having to take care of serialization themselves as is the case in most other workflow orchestration engines.
This works so well that many users don't even realize all the logic that is happening below the surface, often, they are forgetting that they aren't writing Python.

Having run a Flyte deployment for ML teams at two companies, two "rough edges" of the type system have surprised/frustrated users and raised questions more times than I can recount:

There is no support for Python tuples in the type engine:

from flytekit import task, workflow


@task
def test(inp: tuple) -> tuple:
    return inp

@workflow
def wf():
    test(inp=(1, 2, 3))

if __name__ == "__main__":
    wf()

Gives RestrictedTypeError: Transformer for type <class 'tuple'> is restricted currently

Due to protobuf limitations, ints in untyped dictionaries are turned into floats:

@task
def foo(inp: dict):
    print(inp)


@workflow
def wf():
    foo(inp={"foo": 1})

Logs {'foo': 1.0}

While this might be an unfair assessment given how much heavy lifting the type engine does for users, I have repeatedly heard frustration such as "Why the h*** is my batch size all of a sudden a float???" or "wait, flyte doesn't even support tuples?"
Somehow there appears to be little understanding that "such a basic python thing isn't supported".

Here in slack, we discussed this issue before and the underlying reasons are limitations in protobuf and the difficulty to distinguish tuples from named tuples.

Given how often I personally have been approached about these two points, I would like to discuss ways to work around the underlying technical limitations so that these issues can be hidden from users.

fg91 · 2024-01-18T18:00:20Z

fg91
Jan 18, 2024
Collaborator Author

Linking #1337 and #4358 as these issues certainly fall under the "rough edges" discussed here.

0 replies

fg91 · 2024-01-18T18:09:24Z

fg91
Jan 18, 2024
Collaborator Author

Currently, the right type transformer is determined as follows (see here):

class TypeEngine(...):

def get_transformer(cls, python_type: Type) -> TypeTransformer[T]:
        """
        The TypeEngine hierarchy for flyteKit. This method looksup and selects the type transformer. The algorithm is
        as follows

          d = dictionary of registered transformers, where is a python `type`
          v = lookup type
        Step 1:
            If the type is annotated with a TypeTransformer instance, use that.

        Step 2:
            find a transformer that matches v exactly

        Step 3:
            find a transformer that matches the generic type of v. e.g List[int], Dict[str, int] etc

        Step 4:
            Walk the inheritance hierarchy of v and find a transformer that matches the first base class.
            This is potentially non-deterministic - will depend on the registration pattern.

            Special case:
                If v inherits from Enum, use the Enum transformer even if Enum is not the first base class.

            TODO lets make this deterministic by using an ordered dict

        Step 5:
            if v is of type data class, use the dataclass transformer
        """

We could add another step between steps 1 and 2 that checks whether v is "untyped", e.g. multi-variate dicts, lists or tuples. (Currently only univariate lists are supported and multivariate dicts are transported as json and have the int->float conversion issue described above).

In one of the recent contributors' syncs, it was proposed to transport these objects "without a type" as pickle.

0 replies

EngHabu · 2024-01-22T21:54:17Z

EngHabu
Jan 22, 2024
Maintainer

@wild-endeavor, @kumare3 for comment

Variants

Using generic tuple

@task
def test(inp: tuple) -> tuple:
    return inp

@workflow
def my_wf() tuple:
    a = test(inp=(1, "hello"))
    return a

Using typed Tuple

@task
def test(inp: typing.Tuple[int, str]) -> typing.Tuple[int, str]:
    return inp

@workflow
def my_wf() typing.Tuple[int, str]:
    a = test(inp=(1, "hello"))
    return a

Concerns

Flyte already uses typing.Tuple to introspect/represent the promises returned by a task. This makes it hard to distinguish between the outputs of the following two tasks:
```
    @task
    def my_task() -> typing.Tuple[int, str]:
        ...

    @task
    def my_task() -> (int, str):
        ...
```
The solution to this is: We should return a new PromiseTuple instead all the time and let usage determine how it's used. If the user captures it in one variable, and passes it as one variable, we can understand that they are just returning a single Tuple and use it that way.
Transformation layer. We should add a new LiteralType (i.e. Tuple) and use a LiteralMap as the storage layer.
tuple is challenging, can we do away without it for now?
Depending on # 3, can a task returning Tuple[int, str] be received in another task that accepts tuple?

2 replies

fg91 Jan 23, 2024
Collaborator Author

This makes it hard to distinguish between the outputs of the following two tasks

@EngHabu do you mean differentiating between

return (1, "foo")

and

return 2, "bar"

?

fg91 Jan 24, 2024
Collaborator Author

Another question:
Does it matter to the type engine whether the user specifies tuple[str] or typing.Tuple[str]?

After talking to some of our users, I'd say that not allowing untyped tuples is ok if the error message gives clear instructions what to change. It would be important to them though that variable length tuples are supported, e.g. tuple[str, ...] and also typed tuples with multiple types, e.g. tuple[str, int].

When it comes to dicts, users can currently pass e.g. hyperparmams = {"lr": 0.01, "bs": 64} and receive bs=64.0 in the task. So we already allow untyped dicts and restricting this would be backwards breaking. I feel it's important that we prevent this unintended conversion somehow. For instance by detecting in to_literal that the dict is untyped/multi-variate and "transporting it in any other way ™️" that doesn't lead to this conversion.

EngHabu · 2024-01-29T18:27:02Z

EngHabu
Jan 29, 2024
Maintainer

Hey @fg91,

in the first example, we expect users to capture the output like this:
o = my_task()... where o will be the full typing.Tuple
in the second case, we expect users to capture the output like this:
a, b = my_task()
We can't really know that at the time of creating the promise to return. I'm just saying we don't have to know, if we always return a TuplePromise of Promises, we can understand the user's intention (are they going to capture it in one variable vs explode the tuple) later on when we see which promises are being used (the enclosing TuplePromise vs the exploded promises)

I do not understand the distinction between tuple[str] & typing.Tuple[str], somebody with better Python understand should weigh in, @wild-endeavor?

How is the type for these dict is declared? def my_task(a: dict): ... ?

--
RE: ints in dicts...
Short of creating our own Protos and parses (which is doable) that prioritize encoding ints, there doesn't seem to be a great way (protocolbuffers/protobuf#10255). I think we should allow users to force pickling (which would work for python only tasks) for such cases... do you know if that's supported in flyteKit?

1 reply

fg91 Jan 30, 2024
Collaborator Author

Hey @EngHabu

I don't have good enough understanding yet how TuplePromise would work I think.

That being said, would it simplify things if we don't touch how the "main" tuple returned from a task with multiple return values is treated but introduce "special handling" for additional tuples contained in the "main tuple"? Basically, not treat the main tuple with a potential new TupleTransformer but keep the existing logic.

def foo() -> tuple[str, tuple[int], str]:
    return "foo", (1, 2, 3), "bar"

This "inner" tuple wouldn't have to be exploded.
We could enforce that these inner tuples are typed so that one cannot pass a tuple[str] to a task expecting tuple[int].

With a meaningful error message that Flyte is type safe and doesn't accept untyped tuples, I don't think users would be unhappy :)

RE: ints in dicts...

One can annotate a type with a specific type transformer to ensure this one is used, see here. This would allow users to force pickling dicts.

I don't think this solves the problem though. For users who understand what is happening, the "int in dict" issue is not an issue, they will just use e.g. a dataclass instead ...

I bring this up because I've seen many users who are just starting with Flyte and aren't aware of the type engine run into this and be utterly confused and also rather frustrated.

If there isn't a good solution to this problem (or would be a lot of effort), maybe we can instead just catch this potential issue and warn users to use typed dicts.

fg91 · 2024-08-29T16:56:51Z

fg91
Aug 29, 2024
Collaborator Author

For visibility:

there is an RFC about tuple support now: [RFC] Tuple IDL #5699
with this RFC which proposes to use messagepack as transport for dicts (and dataclasses, ...), the "int -> float" conversion issue would be solved.

Because both of these topics are now being discussed in formal RFCs, I close this discussion here.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Polish "rough edges" of type engine #4544

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 5 comments 3 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Polish "rough edges" of type engine #4544

fg91 Dec 7, 2023 Collaborator

Replies: 5 comments · 3 replies

fg91 Jan 18, 2024 Collaborator Author

fg91 Jan 18, 2024 Collaborator Author

EngHabu Jan 22, 2024 Maintainer

Variants

Concerns

fg91 Jan 23, 2024 Collaborator Author

fg91 Jan 24, 2024 Collaborator Author

EngHabu Jan 29, 2024 Maintainer

fg91 Jan 30, 2024 Collaborator Author

fg91 Aug 29, 2024 Collaborator Author

fg91
Dec 7, 2023
Collaborator

Replies: 5 comments 3 replies

fg91
Jan 18, 2024
Collaborator Author

fg91
Jan 18, 2024
Collaborator Author

EngHabu
Jan 22, 2024
Maintainer

fg91 Jan 23, 2024
Collaborator Author

fg91 Jan 24, 2024
Collaborator Author

EngHabu
Jan 29, 2024
Maintainer

fg91 Jan 30, 2024
Collaborator Author

fg91
Aug 29, 2024
Collaborator Author