Some questions about the parameter --chunked-prefill-size #2815
Unanswered
yuki252111
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
The above is part of the source code from the file scheduler.py. I think the role of self.rem_chunk_tokens is the same as self.rem_input_tokens, both are used to limit the total number of prefill tokens.Both of their modifications are located in the following function.
Of course, self.rem_chunk_tokens is also used to determine whether the prompt of a request needs to be truncated.
What confuses me is that I understand that self.rem_chunk_tokens should be used to split the prompt of the last request or each request, but each request will modify self.rem_chunk_tokens, and finally determine whether to continue adding requests to the batch based on self.rem_chunk_tokens > 0. So I don't understand what self.rem_chunk_tokens actually does.
Hope to receive your feedback, thank you!
Beta Was this translation helpful? Give feedback.
All reactions