-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[bug] forwardAsync assertion failed #2494
Comments
Hi @akhoroshev, thank you for taking time to report the issue. From just looking at code, the logic seems correct to me. I see no way how Could you share a reproducer, please? |
It happens under load, for example it's possible to have two requests (or more):
They are both valid ( But assertion fails |
I can't because it's a closed model. |
Hi! Any updates here? |
meet "Encountered an error in forwardAsync function: std::bad_cast" error when running BERT/Roberta,
|
My version
Assertion fails under load
I don't know how it's possible because
input_length <= 7168
max_new_tokens=min(4096, 8192 - input_length)
Moreover, Executor additionally checks this invariant.
The only idea is that
tensorrt_llm::batch_manager::TrtGptModelInflightBatching::setupDecoderStep
is setting wrongmax_new_tokens
fordecoder_batch::Request
(under certain conditions)The text was updated successfully, but these errors were encountered: