-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add exclude_input_in_output
option to vllm backend
#35
Conversation
Makes sense. Thank you so much @oandreeva-nv! Looking forward for release. |
@mkhludnev This most likely will be a part of 24.03 release, but you don't need to wait this long. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work!
A liitle bit of context triton-inference-server/server#6867 |
This PR adds
exclude_input_in_output
flag to vllm backend inputs.It only impacts non-streaming case.
For streaming case, I refactored code and return only
diffs
, e.g.:Prompt = "The most dangerous animal is"
Response:
The above case will be the only possible response in the streaming mode. Open to discussions.
cc @@mkhludnev