Strange behaviour with streaming events from nested workflow #16177

rohanliston · 2024-09-24T01:24:50Z

rohanliston
Sep 24, 2024

I'm having trouble getting a nested workflow working properly with event streaming. The workflow runs fine, but the event streaming is behaving strangely.

The workflow I'm testing is a variation on the Joke example from the documentation. I have a MainWorkflow that checks if the context has a topic field. If topic is missing, it runs a TopicWorkflow. If it's present, it runs a JokeWorkflow.

The first time I run the workflow without a topic field, all the events come through fine. When I set the topic field and run the workflow again, I somehow get the last event from the first run again, and then no further events come through.

If I run the workflow a third time with no changes, the events come through, but if I run it a fourth time with no changes, the issue reappears.

I've logged the context object and made sure it's the exact same object for each run. I also checked that I'm awaiting the result properly each time. Am I doing something wrong, or is there an async bug somewhere here?

My workflow code:

from llama_index.core.workflow import (
    StartEvent,
    StopEvent,
    Workflow,
    step,
    Context,
    Event,
)


class MessageEvent(Event):
    message: str


class TopicWorkflow(Workflow):
    @step
    async def start(self, ctx: Context, event: StartEvent) -> StopEvent:
        print("TopicWorkflow: Asking for topic")
        ctx.write_event_to_stream(MessageEvent(message="MESSAGE FROM TOPIC WORKFLOW"))
        return StopEvent(result="Give me a topic and I'll tell you a joke!")


class JokeWorkflow(Workflow):
    @step
    async def start(self, ctx: Context, event: StartEvent) -> StopEvent:
        print(f"JokeWorkflow: Telling joke about {event.topic}")
        ctx.write_event_to_stream(MessageEvent(message="MESSAGE FROM JOKE WORKFLOW"))
        return StopEvent(result=f"Here is a funny joke about {event.topic}")


class MainWorkflow(Workflow):
    @step
    async def start(
        self,
        context: Context,
        event: StartEvent,
        topic_workflow: Workflow,
        joke_workflow: Workflow,
    ) -> StopEvent:
        topic = await context.get("topic", "")

        print(context)

        if topic == "":
            # Context has no topic; we need to ask for it
            print("MainWorkflow: Asking for topic")
            handler = topic_workflow.run()
            async for nested_event in handler.stream_events():
                context.write_event_to_stream(nested_event)

            res = await handler
        else:
            # We have a topic, tell a joke about it
            print(f"MainWorkflow: Already have topic {topic}")
            handler = joke_workflow.run(topic=topic)
            async for nested_event in handler.stream_events():
                context.write_event_to_stream(nested_event)

            res = await handler

        return StopEvent(result=res)

Main:

import asyncio
from project_context_bootstrapping.workflows.test import (
    MainWorkflow,
    TopicWorkflow,
    JokeWorkflow,
)
from llama_index.core.workflow import Context


async def main():
    w = MainWorkflow(timeout=60, verbose=True)
    w.add_workflows(
        topic_workflow=TopicWorkflow(),
        joke_workflow=JokeWorkflow(),
    )
    context = Context(workflow=w)

    # First workflow run with no topic
    print("Running workflow with no topic")
    handler = w.run(ctx=context)
    async for event in handler.stream_events():
        print(f">>>>> EVENT (1): {event}")
    result = await handler
    print("FIRST RESULT: ", result)
    print("")

    # Set the joke topic
    print("Running workflow second time")
    await context.set("topic", "pirates")

    # Second workflow run with a topic in the context
    # Issue: I get no events this time (except for the last event again)
    handler = w.run(ctx=context)
    async for event in handler.stream_events():
        print(f">>>>> EVENT (2): {event}")
    result = await handler
    print("SECOND RESULT: ", result)

    # Third workflow run, no changes
    # This time, events come through fine
    print("")
    handler = w.run(ctx=context)
    async for event in handler.stream_events():
        print(f">>>>> EVENT (3): {event}")
    result = await handler
    print("THIRD RESULT: ", result)

    # Fourth workflow run, no changes
    # Previous issue reappears :(
    print("")
    handler = w.run(ctx=context)
    async for event in handler.stream_events():
        print(f">>>>> EVENT (4): {event}")
    result = await handler
    print("FOURTH RESULT: ", result)


if __name__ == "__main__":
    asyncio.run(main())

Output:

Running workflow with no topic
Running step start
<llama_index.core.workflow.context.Context object at 0x7f3836508bd0>
MainWorkflow: Asking for topic
TopicWorkflow: Asking for topic
>>>>> EVENT (1): message='MESSAGE FROM TOPIC WORKFLOW'
>>>>> EVENT (1): result="Give me a topic and I'll tell you a joke!"
Step start produced event StopEvent
FIRST RESULT:  Give me a topic and I'll tell you a joke!

Running workflow second time
>>>>> EVENT (2): result="Give me a topic and I'll tell you a joke!"
Running step start
<llama_index.core.workflow.context.Context object at 0x7f3836508bd0>
MainWorkflow: Already have topic pirates
JokeWorkflow: Telling joke about pirates
Step start produced event StopEvent
SECOND RESULT:  Here is a funny joke about pirates

>>>>> EVENT (3): message='MESSAGE FROM JOKE WORKFLOW'
>>>>> EVENT (3): result='Here is a funny joke about pirates'
Running step start
<llama_index.core.workflow.context.Context object at 0x7f3836508bd0>
MainWorkflow: Already have topic pirates
JokeWorkflow: Telling joke about pirates
Step start produced event StopEvent
THIRD RESULT:  Here is a funny joke about pirates

>>>>> EVENT (4): result='Here is a funny joke about pirates'
Running step start
<llama_index.core.workflow.context.Context object at 0x7f3836508bd0>
MainWorkflow: Already have topic pirates
JokeWorkflow: Telling joke about pirates
Step start produced event StopEvent
FOURTH RESULT:  Here is a funny joke about pirates

Answered by dosubot[bot]

Sep 24, 2024

Hey @rohanliston! It sounds like you've got it covered with the maintainer. I'm a bot that helps with bugs, technical questions, and engineering knowledge, so this one's a bit out of my scope. If you think I should have answered this request, feel free to reach out to the Dosu Team.

To continue talking to Dosu, mention @dosu.

View full answer

@rohanliston · 2024-09-24T01:26:09Z

dosubot[bot]
bot Sep 24, 2024

Hey there, @rohanliston! 👋 I'm here to help you with any bugs, questions, or contributions you have. Let's tackle this together!

The inconsistent behavior in your nested workflow's event streaming, where it sometimes repeats the last event from the previous run and then stops streaming further events, is likely due to the context (ctx) not being properly cleaned up between runs. Specifically, the _start method should reset the context's tasks, queues, and step flags, but it might not be sufficient if the context is reused without proper cleanup.

Here's the relevant part of the code that handles context cleanup:

def _start(self, stepwise: bool = False, ctx: Optional[Context] = None) -> Context:
    """Sets up the queues and tasks for each declared step.

    This method also launches each step as an async task.
    """
    if ctx is None:
        ctx = Context(self, stepwise=stepwise)
        self._contexts.add(ctx)
    else:
        # clean up the context from the previous run
        ctx._tasks = set()
        ctx._queues = {}
        ctx._step_flags = {}
        ctx._retval = None

Ensure that the context is fully reset before starting a new run. Additionally, make sure that the stream_events method is not called concurrently, as it can only handle one active context at a time:

async def stream_events(self) -> AsyncGenerator[Event, None]:
    """Returns an async generator to consume any event that workflow steps decide to stream."""
    await asyncio.sleep(0)
    if len(self._contexts) > 1:
        msg = (
            "This workflow has multiple concurrent runs in progress and cannot stream events. "
            "To be able to stream events, make sure you call `run()` on this workflow only once."
        )
        raise WorkflowRuntimeError(msg)

    ctx = next(iter(self._contexts))
    while True:
        ev = await ctx.streaming_queue.get()
        if type(ev) is StopEvent:
            break

        yield ev

    self._contexts.remove(ctx)

To avoid the issue, ensure that:

The context is properly reset before each run.
The stream_events method is not called concurrently.

If the problem persists, consider adding more detailed logging to trace the state of the context and the events being processed.

Additionally, you can refer to the test_multiple_sequential_streams and test_resume_streams tests from the llama-index-core repository to understand how sequential and resumed streams are handled. These tests ensure that multiple sequential streams do not raise errors and that the context can be resumed correctly:

@pytest.mark.asyncio()
async def test_multiple_sequential_streams():
    wf = StreamingWorkflow()
    r = wf.run()

    # stream 1
    async for _ in r.stream_events():
        pass
    await r

    # stream 2 -- should not raise an error
    r = wf.run()
    async for _ in r.stream_events():
        pass
    await r

@pytest.mark.asyncio()
async def test_resume_streams():
    class CounterWorkflow(Workflow):
        @step
        async def count(self, ctx: Context, ev: StartEvent) -> StopEvent:
            ctx.write_event_to_stream(Event(msg="hello!"))

            cur_count = await ctx.get("cur_count", default=0)
            await ctx.set("cur_count", cur_count + 1)
            return StopEvent(result="done")

    wf = CounterWorkflow()
    handler_1 = wf.run()

    async for _ in handler_1.stream_events():
        pass
    await handler_1

    handler_2 = wf.run(ctx=handler_1.ctx)
    async for _ in handler_2.stream_events():
        pass
    await handler_2

    assert await handler_2.ctx.get("cur_count") == 2

These tests can help you ensure that your workflow handles event streaming correctly across multiple runs ^[1]^[2]^[3].

To continue talking to Dosu, mention @dosu.

4 replies

rohanliston Sep 24, 2024
Author

@dosu Those _start and stream_events methods look like internal code, so can I assume that if the context is not being cleaned up properly, that's a bug?

The test cases you shared work fine, but they're not nested workflows.

dosubot[bot] bot Sep 24, 2024

Yes, if the context is not being cleaned up properly, it could be considered a bug. The _start and stream_events methods are indeed internal, and they should handle context cleanup correctly. However, since your issue involves nested workflows, additional context management might be necessary.

To ensure proper context cleanup in nested workflows, you can explicitly clean up the context at the end of each workflow run. Here’s an example of how you might modify your MainWorkflow to ensure the context is cleaned up properly:

class MainWorkflow(Workflow):
    @step
    async def start(
        self,
        context: Context,
        event: StartEvent,
        topic_workflow: Workflow,
        joke_workflow: Workflow,
    ) -> StopEvent:
        topic = await context.get("topic", "")

        print(context)

        if topic == "":
            # Context has no topic; we need to ask for it
            print("MainWorkflow: Asking for topic")
            handler = topic_workflow.run()
            async for nested_event in handler.stream_events():
                context.write_event_to_stream(nested_event)

            res = await handler
        else:
            # We have a topic, tell a joke about it
            print(f"MainWorkflow: Already have topic {topic}")
            handler = joke_workflow.run(topic=topic)
            async for nested_event in handler.stream_events():
                context.write_event_to_stream(nested_event)

            res = await handler

        # Clean up context here
        context.cleanup()

        return StopEvent(result=res)

Adding context.cleanup() at the end of the start step ensures that the context is cleaned up after the workflow step completes. This should help prevent the strange behavior with event streaming in nested workflows ^[1].

If the problem persists, you might need to investigate further into how nested workflows manage their contexts and ensure that each nested workflow properly isolates and cleans up its context.

To continue talking to Dosu, mention @dosu.

rohanliston Sep 24, 2024
Author

@dosu That method doesn't exist. AttributeError: 'Context' object has no attribute 'cleanup'

dosubot[bot] bot Sep 24, 2024

To ensure proper context cleanup in nested workflows, you can manually clear the relevant data structures within the Context object. Since the Context class does not have a cleanup method, you can define a custom cleanup method to reset or clear the necessary attributes.

Here is an example of how you might implement a cleanup method within the Context class:

class Context:
    # ... existing code ...

    def cleanup(self) -> None:
        """Custom method to clean up the context."""
        self._queues.clear()
        self._tasks.clear()
        self._broker_log.clear()
        self._step_flags.clear()
        self._accepted_events.clear()
        self._retval = None
        self._streaming_queue = asyncio.Queue()
        self._globals.clear()
        self._events_buffer.clear()

    # ... existing code ...

You can then call this cleanup method at the appropriate points in your workflow to ensure that the context is properly cleaned up. For example, in the _start method of the Workflow class, you can see that the context is cleaned up from the previous run if it is not None:

def _start(self, stepwise: bool = False, ctx: Optional[Context] = None) -> Context:
    """Sets up the queues and tasks for each declared step.

    This method also launches each step as an async task.
    """
    if ctx is None:
        ctx = Context(self, stepwise=stepwise)
        self._contexts.add(ctx)
    else:
        # clean up the context from the previous run
        ctx.cleanup()

    for name, step_func in self._get_steps().items():
        ctx._queues[name] = asyncio.Queue()
        ctx._step_flags[name] = asyncio.Event()

        # At this point, step_func is guaranteed to have the `__step_config` attribute
        step_config: StepConfig = getattr(step_func, "__step_config")

        async def _task(
            name: str,
            queue: asyncio.Queue,
            step: Callable,
            config: StepConfig,
        ) -> None:
            while True:
                ev = await queue.get()
                if type(ev) not in config.accepted_events:
                    continue

                # do we need to wait for the step flag?
                if stepwise:
                    await ctx._step_flags[name].wait()

                    # clear all flags so that we only run one step
                    for flag in ctx._step_flags.values():
                        flag.clear()

                if self._verbose and name != "_done":
                    print(f"Running step {name}")

                # run step
                kwargs: Dict[str, Any] = {}
                if config.context_parameter:
                    kwargs[config.context_parameter] = ctx
                for service_definition in config.requested_services:
                    service = self._service_manager.get(
                        service_definition.name, service_definition.default_value
                    )
                    kwargs[service_definition.name] = service
                kwargs[config.event_name] = ev

                # wrap the step with instrumentation
                instrumented_step = dispatcher.span(step)

                # - check if its async or not
                # - if not async, run it in an executor
                if asyncio.iscoroutinefunction(step):
                    retry_start_at = time.time()
                    attempts = 0
                    while True:
                        try:
                            new_ev = await instrumented_step(**kwargs)
                            break  # exit the retrying loop
                        except Exception as e:
                            if config.retry_policy is None:
                                raise e from None

                            delay = config.retry_policy.next(
                                retry_start_at + time.time(), attempts, e
                            )
                            if delay is None:
                                # We're done retrying
                                raise e from None

                            attempts += 1
                            if self._verbose:
                                print(
                                    f"Step {name} produced an error, retry in {delay} seconds"
                                )
                            await asyncio.sleep(delay)
                else:
                    run_task = functools.partial(instrumented_step, **kwargs)
                    new_ev = await asyncio.get_event_loop().run_in_executor(
                        None, run_task
                    )

                if self._verbose and name != "_done":
                    if new_ev is not None:
                        print(f"Step {name} produced event {type(new_ev).__name__}")
                    else:
                        print(f"Step {name} produced no event")

                # handle the return value
                if new_ev is None:
                    continue

                # Store the accepted event for the drawing operations
                ctx._accepted_events.append((name, type(ev).__name__))

                if not isinstance(new_ev, Event):
                    warnings.warn(
                        f"Step function {name} returned {type(new_ev).__name__} instead of an Event instance."
                    )
                else:
                    ctx.send_event(new_ev)

        for _ in range(step_config.num_workers):
            ctx._tasks.add(
                asyncio.create_task(
                    _task(name, ctx._queues[name], step_func, step_config),
                    name=name,
                )
            )

    return ctx

By calling ctx.cleanup() in the _start method, you ensure that the context is properly cleaned up before reusing it in a new workflow run ^[1]^[2].

To continue talking to Dosu, mention @dosu.

rohanliston · 2024-09-24T04:14:16Z

rohanliston
Sep 24, 2024
Author

@dosu I've discussed this with a maintainer on Discord and it looks like there may be a bug, so I'll close this discussion.

1 reply

dosubot[bot] bot Sep 24, 2024

Hey @rohanliston! It sounds like you've got it covered with the maintainer. I'm a bot that helps with bugs, technical questions, and engineering knowledge, so this one's a bit out of my scope. If you think I should have answered this request, feel free to reach out to the Dosu Team.

To continue talking to Dosu, mention @dosu.

Answer selected by rohanliston

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Strange behaviour with streaming events from nested workflow #16177

{{title}}

Replies: 2 comments 5 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Strange behaviour with streaming events from nested workflow #16177

rohanliston Sep 24, 2024

Replies: 2 comments · 5 replies

dosubot[bot] bot Sep 24, 2024

rohanliston Sep 24, 2024 Author

dosubot[bot] bot Sep 24, 2024

rohanliston Sep 24, 2024 Author

dosubot[bot] bot Sep 24, 2024

rohanliston Sep 24, 2024 Author

dosubot[bot] bot Sep 24, 2024

rohanliston
Sep 24, 2024

Replies: 2 comments 5 replies

dosubot[bot]
bot Sep 24, 2024

rohanliston Sep 24, 2024
Author

rohanliston Sep 24, 2024
Author

rohanliston
Sep 24, 2024
Author