Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat: decode request in threadpool #290

Conversation

deependujha
Copy link

@deependujha deependujha commented Sep 23, 2024

Before submitting
  • Was this discussed/agreed via a Github issue? (no need for typos and docs improvements)
  • Did you read the contributor guideline, Pull Request section?
  • Did you make sure to update the docs?
  • Did you write any new necessary tests?

⚠️ How does this PR impact the user? ⚠️

faster request decoding if it contains binary data (images, audios, etc).

GOOD:

faster decoding in batched loop

BAD:

Might be overhead in the case of simple requests


What does this PR do?

Fixes #166

Speed up reference here

PR review

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in GitHub issues there's a high chance it will not be merged.

Did you have fun?

Make sure you had fun coding 🙃

@deependujha
Copy link
Author

Is this what was intended, or have I got something wrong?

Copy link
Collaborator

@aniketmaurya aniketmaurya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for opening the PR @deependujha. Looks great so far 🚀

We should provide control to users whether to enable decoding using a threadpool. It might come useful for cases when decode request is CPU intensive or not thread-safe.

Possible API:

class LitServer:
    def __init__(..., concurrent_decode:bool=True)

cc: @lantiga

src/litserve/loops.py Outdated Show resolved Hide resolved
src/litserve/loops.py Outdated Show resolved Hide resolved
src/litserve/loops.py Outdated Show resolved Hide resolved
Copy link

codecov bot commented Sep 23, 2024

Codecov Report

Attention: Patch coverage is 89.47368% with 2 lines in your changes missing coverage. Please review.

Project coverage is 95%. Comparing base (757525e) to head (be2c204).
Report is 24 commits behind head on main.

Additional details and impacted files
@@         Coverage Diff         @@
##           main   #290   +/-   ##
===================================
- Coverage    95%    95%   -0%     
===================================
  Files        19     19           
  Lines      1244   1260   +16     
===================================
+ Hits       1182   1196   +14     
- Misses       62     64    +2     

@williamFalcon
Copy link
Contributor

@deependujha nice! love the "faster" claim haha. do you have a benchmark or something showing that it is indeed faster and by how much?

@deependujha
Copy link
Author

deependujha commented Sep 24, 2024

@deependujha nice! love the "faster" claim haha. do you have a benchmark or something showing that it is indeed faster and by how much?

Hi @williamFalcon , I stated so on the basis of what was claimed in the original issue. Testing on 4 GPU machine does not reflect the same.

Throughput stays in the similar range (300-380), averaging at 330. benchmark code by Aniket

I'll try testing it with an audio model. If performance doesn't differ much, I'll close the PR.

):
if concurrent_decode:
executor = concurrent.futures.ThreadPoolExecutor(max_workers=os.cpu_count())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can limit number of threads to batch size

Suggested change
executor = concurrent.futures.ThreadPoolExecutor(max_workers=os.cpu_count())
executor = concurrent.futures.ThreadPoolExecutor(max_workers=min(max_batch_size, os.cpu_count()))

@aniketmaurya aniketmaurya added the enhancement New feature or request label Sep 26, 2024
@deependujha
Copy link
Author

tough luck, tested with the audio model but no significant difference in performance. I'll go ahead and close the PR.

@deependujha deependujha closed this Oct 4, 2024
@deependujha deependujha deleted the feat/decode-request-in-threadpool branch October 4, 2024 17:08
@aniketmaurya
Copy link
Collaborator

@deependujha how does your decode_request logic look like? Does it do a CPU intensive operation or IO?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Map decode_request during dynamic batching using a threadpool
3 participants