Wrong result when using lora on multi gpus #2589

ShuaiShao93 · 2024-12-18T16:57:11Z

System Info

x86_64, debian 11, 8 A100 GPUs

Who can help?

No response

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

Start 2 VMs, one with 1 A100, the other one with 8 A100s
git clone Llama-3.2-3B-Instruct, and arbitrary fp16 lora weights
Install trtllm 0.15.0
On 8-gpu VM, run the commands below. On 1-gpu VM, remove --tp_size=8

python3 TensorRT-LLM/examples/llama/convert_checkpoint.py --model_dir ./Llama-3.2-3B-Instruct --output_dir ./tllm_3b_checkpoint_8gpu_fp16 --dtype float16 --tp_size=8

trtllm-build --checkpoint_dir ./tllm_3b_checkpoint_8gpu_fp16 --output_dir ./tmp/llama/3B/trt_engines/fp16/8-gpu  --gpt_attention_plugin auto  --gemm_plugin auto --max_num_tokens 128000 --max_batch_size 8 --logits_dtype=float32 --gather_generation_logits --kv_cache_type=paged --lora_plugin auto  --lora_dir llama-3.2-3b-ins-finetuned-lora-weights

On 8-gpu VMs, runt this command. On 1-gpu VM, remove mpirun -n 8

mpirun -n 8 python3 TensorRT-LLM/examples/run.py --engine_dir=./tmp/llama/3B/trt_engines/fp16/8-gpu --max_output_len 1 --max_input_length=100000 --run_profiling --tokenizer_dir ./Llama-3.2-3B-Instruct --input_file input.txt --lora_dir llama-3.2-3b-ins-finetuned-lora-weights --lora_task_uids 0 --output_logits_npy test.npy && python3 -c "import numpy; print(numpy.load('test_generation.npy'))"

Expected behavior

On both VMs, the output text and the generation logits should be identical

actual behavior

Neither output text nor generation logits are identical. And the output text from multi-gpu instance is totally garbage if we set max_output_len to a higher value

additional notes

If we disable lora, both texts and logits are identical

The text was updated successfully, but these errors were encountered:

ShuaiShao93 · 2024-12-23T17:37:13Z

@nv-guomingz this is a very serious bug, can you help triage it? Thanks!

ShuaiShao93 · 2024-12-26T19:41:52Z

ok, I think this only happens when we export the lora weights in a way that auto_mapping is not null and task_type is not CAUSAL_LM. My adapter_config.json is

{
  "alpha_pattern": {},
  "auto_mapping": {
    "base_model_class": "LlamaForCausalLM",
    "parent_library": "transformers.models.llama.modeling_llama"
  },
  "base_model_name_or_path": "meta-llama/Llama-3.2-3B-Instruct",
  "bias": "none",
  "fan_in_fan_out": false,
  "inference_mode": true,
  "init_lora_weights": true,
  "layer_replication": null,
  "layers_pattern": null,
  "layers_to_transform": null,
  "loftq_config": {},
  "lora_alpha": 16,
  "lora_dropout": 0,
  "megatron_config": null,
  "megatron_core": "megatron.core",
  "modules_to_save": null,
  "peft_type": "LORA",
  "r": 16,
  "rank_pattern": {},
  "revision": null,
  "target_modules": [
    "down_proj",
    "v_proj",
    "gate_proj",
    "q_proj",
    "up_proj",
    "k_proj",
    "o_proj"
  ],
  "task_type": null,
  "use_dora": false,
  "use_rslora": false
}

When auto_mapping is null and task_type is CAUSAL_LM, this is good

ShuaiShao93 · 2024-12-27T20:00:36Z

Note that this only impacts the trtllm-build, but not run.py. This means when we run trtllm-build, the --lora_dir must have task_type: CAUSAL_LM.

ShuaiShao93 · 2024-12-27T22:22:28Z

Actually even with task_type: CAUSAL_LM, it no longer outputs garbage but the outputs are still different from that on single gpu.

ShuaiShao93 · 2024-12-27T23:11:19Z

The interesting thing is, if we add --use_py_session to TensorRT-LLM/examples/run.py, this bug doesn't happen. So I believe it's inconsistency between c++ and python runners.

ShuaiShao93 · 2024-12-28T00:14:17Z

Ok, I have a good repro to compare python and cpp runners. Let me file a separate ticket with newer version and better repro. Closing this one for now.

ShuaiShao93 added the bug Something isn't working label Dec 18, 2024

ShuaiShao93 closed this as completed Dec 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Wrong result when using lora on multi gpus #2589

Wrong result when using lora on multi gpus #2589

ShuaiShao93 commented Dec 18, 2024 •

edited

Loading

ShuaiShao93 commented Dec 23, 2024

ShuaiShao93 commented Dec 26, 2024

ShuaiShao93 commented Dec 27, 2024

ShuaiShao93 commented Dec 27, 2024 •

edited

Loading

ShuaiShao93 commented Dec 27, 2024

ShuaiShao93 commented Dec 28, 2024

Wrong result when using lora on multi gpus #2589

Wrong result when using lora on multi gpus #2589

Comments

ShuaiShao93 commented Dec 18, 2024 • edited Loading

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

actual behavior

additional notes

ShuaiShao93 commented Dec 23, 2024

ShuaiShao93 commented Dec 26, 2024

ShuaiShao93 commented Dec 27, 2024

ShuaiShao93 commented Dec 27, 2024 • edited Loading

ShuaiShao93 commented Dec 27, 2024

ShuaiShao93 commented Dec 28, 2024

ShuaiShao93 commented Dec 18, 2024 •

edited

Loading

ShuaiShao93 commented Dec 27, 2024 •

edited

Loading