-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wrong output on Llama 3.2 1B, but 3B ok #2492
Comments
The LLAMA 3.2 1B works well on my side with local latest code base.
Could u please have a try with today's update? |
@nv-guomingz Using tensor rt llm version 0.14.0 and facing the same issue with a fine tuned llama 3.2 1B instruct model
|
@lucasavila00 were you able to find a solution for this ? @nv-guomingz is this issue similar to #121 ? |
@hello-11 @nv-guomingz have tried with tensor RT LLM version: 0.15.0 as well, but still facing the issue, the output repeats till max tokens. Is there any solution for this ? |
System Info
Both RTX 2070 and RTX A6000
Reproduction
I'm using the latest main (535c9cc)
I'm using the
make wheel
image, from main.I built the 3B model with
And it runs as expected
However, when I do the same for Llama 1B:
It just repeats the same token
This happens on an RTX 2070 and on a RTX A6000.
Expected behavior
3b and 1b should work
actual behavior
just 3b works
additional notes
no additional notes
The text was updated successfully, but these errors were encountered: