Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add exclude_input_in_output option to vllm backend #35

Merged
merged 5 commits into from
Mar 1, 2024

Conversation

oandreeva-nv
Copy link
Contributor

@oandreeva-nv oandreeva-nv commented Feb 27, 2024

This PR adds exclude_input_in_output flag to vllm backend inputs.
It only impacts non-streaming case.
For streaming case, I refactored code and return only diffs, e.g.:
Prompt = "The most dangerous animal is"
Response:

" the",
            " one",
            " that",
            " is",
            " most",
            " likely",
            " to",
            " be",
            " killed",
            " by",
            " a",
            " car",
            ".",

The above case will be the only possible response in the streaming mode. Open to discussions.
cc @@mkhludnev

@mkhludnev
Copy link
Contributor

Makes sense. Thank you so much @oandreeva-nv! Looking forward for release.

@oandreeva-nv
Copy link
Contributor Author

@mkhludnev This most likely will be a part of 24.03 release, but you don't need to wait this long.
For this PR, you can simply replace /opt/tritonserver/backends/vllm/model.py with the updated version in 24.01 or 24.02 (soon to be released), and it should work, since there's no related changes in tritonserver itself, to handle this flag.

src/model.py Outdated Show resolved Hide resolved
src/model.py Outdated Show resolved Hide resolved
src/model.py Outdated Show resolved Hide resolved
src/model.py Outdated Show resolved Hide resolved
src/model.py Outdated Show resolved Hide resolved
src/model.py Outdated Show resolved Hide resolved
src/model.py Outdated Show resolved Hide resolved
@oandreeva-nv oandreeva-nv requested a review from nnshah1 February 29, 2024 23:02
Copy link
Contributor

@nnshah1 nnshah1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work!

@oandreeva-nv oandreeva-nv merged commit 5c03411 into main Mar 1, 2024
3 checks passed
@oandreeva-nv oandreeva-nv deleted the oandreeva_echo_option branch March 1, 2024 01:44
@mkhludnev
Copy link
Contributor

A liitle bit of context triton-inference-server/server#6867
Thanks, @oandreeva-nv so much!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants