Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error while loading custom finetuned QLoRA model in 4 bit : size mismatch for model.mm_projector.readout.0.weight: copying a param with shape torch.Size([4096, 4096]) from checkpoint, the shape in current model is torch.Size([8388608, 1]). #71

Open
ApoorvFrontera opened this issue Aug 7, 2024 · 3 comments

Comments

@ApoorvFrontera
Copy link

Hi Team,

I have successfully finetuned a QLoRA adapter on a custom dataset. When I try to load it in full precision, it gets loaded and works well

But this takes too much time and GPU memory to run inference. So I wanted to load the model in 4-bit precision. So I pass in the load_4bit parameter:

model_path = '/home/apoorv/development/videollama2/VideoLLaMA2/work_dirs/videollama2_vllava/finetune_videollama2_vllava_qlora'
model_name = get_model_name_from_path(model_path)
tokenizer, model, processor, context_len = load_pretrained_model(model_path, None, model_name, load_4bit=True)

While running this I get the following error:

RuntimeError: Error(s) in loading state_dict for Videollama2MistralForCausalLM:
	size mismatch for model.mm_projector.readout.0.weight: copying a param with shape torch.Size([4096, 4096]) from checkpoint, the shape in current model is torch.Size([8388608, 1]).
	size mismatch for model.mm_projector.readout.2.weight: copying a param with shape torch.Size([4096, 4096]) from checkpoint, the shape in current model is torch.Size([8388608, 1]).

I have debugged and understood the reason that when we pass in the load_4bit=True, the VideoLLama2 model params are loaded as 4bit params. The LLM part weights gets initialized from the base model (in this case mistralai/Mistral-7B-Instruct-v0.2) in 4-bit but the mm_projector does not (I am guessing it is initialized from the random values but in Params4bit class).

Line: https://github.com/DAMO-NLP-SG/VideoLLaMA2/blob/main/videollama2/model/builder.py#L76
model = Videollama2MistralForCausalLM.from_pretrained(model_base, low_cpu_mem_usage=True, config=lora_cfg_pretrained, **kwargs)

Now to initialize the weights for the mm_projector, we try to load it from the previously saved non_lora_trainables.bin which was stored in full precision format.

Line: https://github.com/DAMO-NLP-SG/VideoLLaMA2/blob/main/videollama2/model/builder.py#L101
model.load_state_dict(non_lora_trainables, strict=False)

Debugging Outputs

Model params

**(Pdb) model.model.mm_projector.readout**
Sequential(
  (0): Linear4bit(in_features=4096, out_features=4096, bias=True)
  (1): GELU(approximate='none')
  (2): Linear4bit(in_features=4096, out_features=4096, bias=True)
)
**(Pdb) model.model.mm_projector.readout[0]**
Linear4bit(in_features=4096, out_features=4096, bias=True)
**(Pdb) model.model.mm_projector.readout[0].weight**
Parameter containing:
Parameter(Params4bit([[193],
            [108],
            [250],
            ...,
            [231],
            [107],
            [ 92]], device='cuda:1', dtype=torch.uint8))
**(Pdb) model.model.mm_projector.readout[0].weight.shape**
torch.Size([8388608, 1])

Previously saved non_lora_trainables param

(Pdb) non_lora_trainables[ 'model.model.mm_projector.readout.0.weight'].shape
torch.Size([4096, 4096])

Please advise on how to resolve this.

@ApoorvFrontera
Copy link
Author

Hi Team,

Any help is highly appreciated.

Thanks :)

@clownrat6
Copy link
Member

You can check this issue #78

@LiangMeng89
Copy link

Hi Team,

I have successfully finetuned a QLoRA adapter on a custom dataset. When I try to load it in full precision, it gets loaded and works well

But this takes too much time and GPU memory to run inference. So I wanted to load the model in 4-bit precision. So I pass in the load_4bit parameter:

model_path = '/home/apoorv/development/videollama2/VideoLLaMA2/work_dirs/videollama2_vllava/finetune_videollama2_vllava_qlora'
model_name = get_model_name_from_path(model_path)
tokenizer, model, processor, context_len = load_pretrained_model(model_path, None, model_name, load_4bit=True)

While running this I get the following error:

RuntimeError: Error(s) in loading state_dict for Videollama2MistralForCausalLM:
	size mismatch for model.mm_projector.readout.0.weight: copying a param with shape torch.Size([4096, 4096]) from checkpoint, the shape in current model is torch.Size([8388608, 1]).
	size mismatch for model.mm_projector.readout.2.weight: copying a param with shape torch.Size([4096, 4096]) from checkpoint, the shape in current model is torch.Size([8388608, 1]).

I have debugged and understood the reason that when we pass in the load_4bit=True, the VideoLLama2 model params are loaded as 4bit params. The LLM part weights gets initialized from the base model (in this case mistralai/Mistral-7B-Instruct-v0.2) in 4-bit but the mm_projector does not (I am guessing it is initialized from the random values but in Params4bit class).

Line: https://github.com/DAMO-NLP-SG/VideoLLaMA2/blob/main/videollama2/model/builder.py#L76 model = Videollama2MistralForCausalLM.from_pretrained(model_base, low_cpu_mem_usage=True, config=lora_cfg_pretrained, **kwargs)

Now to initialize the weights for the mm_projector, we try to load it from the previously saved non_lora_trainables.bin which was stored in full precision format.

Line: https://github.com/DAMO-NLP-SG/VideoLLaMA2/blob/main/videollama2/model/builder.py#L101 model.load_state_dict(non_lora_trainables, strict=False)

Debugging Outputs

Model params

**(Pdb) model.model.mm_projector.readout**
Sequential(
  (0): Linear4bit(in_features=4096, out_features=4096, bias=True)
  (1): GELU(approximate='none')
  (2): Linear4bit(in_features=4096, out_features=4096, bias=True)
)
**(Pdb) model.model.mm_projector.readout[0]**
Linear4bit(in_features=4096, out_features=4096, bias=True)
**(Pdb) model.model.mm_projector.readout[0].weight**
Parameter containing:
Parameter(Params4bit([[193],
            [108],
            [250],
            ...,
            [231],
            [107],
            [ 92]], device='cuda:1', dtype=torch.uint8))
**(Pdb) model.model.mm_projector.readout[0].weight.shape**
torch.Size([8388608, 1])

Previously saved non_lora_trainables param

(Pdb) non_lora_trainables[ 'model.model.mm_projector.readout.0.weight'].shape
torch.Size([4096, 4096])

Please advise on how to resolve this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants