You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'd like to understand better how to set this flag value correctly...
In the example given from the backend TensorRT-LLM project, the max_encoder_input_len is set to 8200, whereas in the example page for TensorRT-llm it is set to 4100. The batch size is half the size in the backend example, so I can understand the change between 8200, and 4100, however I do not really understand how the number of features are calculated.
The Llama-3.2-11B-Vision-Instruct and Llama-3.2-11B-Vision models have different image sizes (560x560) vs (448x448) respectively.
I mistakenly set these numbers when building the visual and text engines for Llama-3.2-11B-Vision-Instruct, however only during the runtime run.py test was I warned that the encoder_max_input_length should be 6404 (for a batch size of 2). How was this 6404 number arrived at?
My quite wrong take here: 560/14 = 40. We have 4x of these so $40\cdot40\cdot4 = 6400$. Maybe I get to 6404 because there are four extra tokens for position? How does this take in to account the batch size? Was it only telling me 6404 because I sent in a single picture?
Thanks for any clarification!!
The following commands were used w/ TensorRT-LLM v0.15.0:
Hi,
I'd like to understand better how to set this flag value correctly...
In the example given from the backend TensorRT-LLM project, the
max_encoder_input_len
is set to 8200, whereas in the example page for TensorRT-llm it is set to 4100. The batch size is half the size in the backend example, so I can understand the change between 8200, and 4100, however I do not really understand how the number of features are calculated.The
Llama-3.2-11B-Vision-Instruct
andLlama-3.2-11B-Vision
models have different image sizes (560x560) vs (448x448) respectively.I mistakenly set these numbers when building the visual and text engines for
Llama-3.2-11B-Vision-Instruct
, however only during the runtimerun.py
test was I warned that theencoder_max_input_length
should be 6404 (for a batch size of 2). How was this 6404 number arrived at?My quite wrong take here: 560/14 = 40. We have 4x of these so$40\cdot40\cdot4 = 6400$ . Maybe I get to 6404 because there are four extra tokens for position? How does this take in to account the batch size? Was it only telling me 6404 because I sent in a single picture?
Thanks for any clarification!!
The following commands were used w/ TensorRT-LLM v0.15.0:
The text was updated successfully, but these errors were encountered: