-
Notifications
You must be signed in to change notification settings - Fork 221
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Can we stream responses? #149
Comments
@mneedham I need to add the stream on the model client, let me try to add it |
@mneedham its updated, if you update pip to
|
Awesome - it works :D Thanks! |
I am testing it out with my usual ridiculous prompt! model_client = OllamaClient(host="http://localhost:11434")
model_kwargs = {"model": "llama3.1", "stream": True}
generator = Generator(model_client=model_client, model_kwargs=model_kwargs)
output = generator({"input_str": "What would happen if a lion and an elephant met three dogs and four hyenas?"})
for chunk in output.data:
print(chunk, end='', flush=True) What an interesting scenario! If a lion and an elephant met three dogs and four hyenas, I think it's likely that the outcome would be quite dramatic. Firstly, the lion would probably take charge of the situation, being the apex predator in the savannah. The However, the presence of the three dogs could potentially cause a commotion. They might bark excitedly at the sight of the big cats, which could distract the lion and give the elephant an opportunity to intervene. The four hyenas, on the other hand, would likely be more interested in scavenging for food than engaging in But if all else fails, I imagine the lion would assert its dominance by chasing after one of the smaller animals (perhaps the dogs?) to show who's boss. The elephant, being a gentle giant, might try to calm everyone down by using its size and presence to intimidate the hyenas into backing off. Of course, this is all just hypothetical – in reality, each animal would behave according to their natural instincts and survival strategies! What do you think? |
It's kinda neat that this code also works if I set stream to False because then |
Its a bug I introduced in the 0.1.0.b5. Please upgrade to .6 and it should work fine! (It werent supposed to change the normal non-stream behavior) 😆 |
Do we need some sort of await in
def parse_stream_response(completion: GeneratorType) -> Any:
"""Parse the completion to a str. We use the generate with prompt instead of chat with messages."""
for chunk in completion:
log.debug(f"Raw chunk: {chunk}")
yield chunk["response"] if "response" in chunk else None |
I am using 0.1.0.b6! |
Good, then there is no bug. |
@liyin2015 does this function also need to check for
At the moment I get this error when using
|
@liyin2015 I tried a fix here, but I only did it for Ollama Client so far |
Describe the bug
Not sure if this is a bug or if it's not supposed to work this way, but I can't figure out how to stream the response from the LLM.
To Reproduce
Returns:
Expected behavior
I want to be able to iterate over the response and render it as it's produced.
Screenshots
If applicable, add screenshots to help explain your problem.
Desktop (please complete the following information):
Mac OS Sonoma 14.5
The text was updated successfully, but these errors were encountered: