Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: Check for cancellation on response thread #54

Merged
merged 7 commits into from
Aug 7, 2024
Merged

Conversation

kthui
Copy link
Contributor

@kthui kthui commented Jul 30, 2024

What does the PR do?

Cancellation check is to be performed during the response loop on the response thread. The status of the cancellation will be written back a object shared between the response loop and generate loop on a particular request. After sending a response, the cancellation status will be checked and the generate loop will cancel the generation if the request is cancelled. Combined with the release of GIL while checking for cancellation, this maximizes the vLLM generation time on the GIL, which has seen noticeable performance improvement.

Checklist

  • PR title reflects the change and is of format <commit_type>: <Title>
  • Changes are described in the pull request.
  • Related issues are referenced.
  • Populated github labels field
  • Added test plan and verified test passes.
  • Verified that the PR passes existing CI.
  • Verified copyright is correct on all changed files.
  • Added succinct git squash message before merging ref.
  • All template sections are filled out.
  • Optional: Additional screenshots for behavior/output changes with before/after.

Commit Type:

Check the conventional commit type
box here and add the label to the github PR.

  • build
  • ci
  • docs
  • feat
  • fix
  • perf
  • refactor
  • revert
  • style
  • test

Related PRs:

triton-inference-server/python_backend#372

Where should the reviewer start?

Start at once a cancellation is issued while responses are being actively generated, does the generate loop capture the cancellation signal? Is the final flag sent (both streaming/non-streaming) after cancelling? Is the response sender deleted and garbage collected after cancelling?

Test plan:

This is a performance improvement, so any issue should be covered by existing test cases.

  • CI Pipeline ID: 17061062

Caveats:

N/A

Background

Need to improve the performance on vLLM backend.

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

N/A

@kthui kthui added the PR: perf A code change that improves performance label Jul 31, 2024
@kthui kthui marked this pull request as ready for review July 31, 2024 17:07
src/model.py Outdated Show resolved Hide resolved
@kthui kthui requested a review from oandreeva-nv July 31, 2024 23:56
oandreeva-nv
oandreeva-nv previously approved these changes Aug 1, 2024
Copy link
Collaborator

@oandreeva-nv oandreeva-nv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Please, let @Tabrizian to look at it as well

Tabrizian
Tabrizian previously approved these changes Aug 6, 2024
Copy link
Member

@Tabrizian Tabrizian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One minor comment, otherwise looks good.

src/model.py Outdated Show resolved Hide resolved
@kthui kthui dismissed stale reviews from Tabrizian and oandreeva-nv via 6b7e241 August 6, 2024 23:26
Co-authored-by: Iman Tabrizian <[email protected]>
@kthui kthui merged commit 843cbdd into main Aug 7, 2024
3 checks passed
@kthui kthui deleted the jacky-cancel-thread branch August 7, 2024 17:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
PR: perf A code change that improves performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants