Skip to content

Commit

Permalink
Update on "update llama runner to decode single token"
Browse files Browse the repository at this point in the history
Right now, we don't print the generated response in the eager runner until all tokens are generated. This is not good experience as we need to wait until all tokens are generated to see the response.

This PR updates it to decode each new token immediately after it is generated.

Differential Revision: [D65578306](https://our.internmc.facebook.com/intern/diff/D65578306/)

[ghstack-poisoned]
  • Loading branch information
helunwencser committed Nov 11, 2024
2 parents 967eb29 + df7be71 commit 0e7432e
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion examples/models/llama/runner/generation.py
Original file line number Diff line number Diff line change
Expand Up @@ -125,7 +125,7 @@ def text_completion(
echo (bool, optional): Flag indicating whether to include prompt tokens in the generated output. Defaults to False.
Returns:
CompletionPrediction: Completion prediction, which contains the generated text completion.
Generated list of tokens.
Note:
This method generates text completion for the provided prompt, employing nucleus sampling to introduce controlled randomness.
Expand Down

0 comments on commit 0e7432e

Please sign in to comment.