-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wrong result when using lora on multi gpus #2589
Comments
@nv-guomingz this is a very serious bug, can you help triage it? Thanks! |
ok, I think this only happens when we export the lora weights in a way that auto_mapping is not null and task_type is not CAUSAL_LM. My adapter_config.json is
When auto_mapping is null and task_type is CAUSAL_LM, this is good |
Note that this only impacts the trtllm-build, but not run.py. This means when we run |
Actually even with |
The interesting thing is, if we add |
Ok, I have a good repro to compare python and cpp runners. Let me file a separate ticket with newer version and better repro. Closing this one for now. |
System Info
x86_64, debian 11, 8 A100 GPUs
Who can help?
No response
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
--tp_size=8
mpirun -n 8
Expected behavior
On both VMs, the output text and the generation logits should be identical
actual behavior
Neither output text nor generation logits are identical. And the output text from multi-gpu instance is totally garbage if we set max_output_len to a higher value
additional notes
If we disable lora, both texts and logits are identical
The text was updated successfully, but these errors were encountered: