-
Notifications
You must be signed in to change notification settings - Fork 69
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
On the issue of Continuous Fine-tuning #82
Comments
I tried, but there was an error while merging the models Traceback (most recent call last):
File "/Bunny/script/merge_lora_weights.py", line 26, in <module>
merge_lora(args)
File "/Bunny/script/merge_lora_weights.py", line 10, in merge_lora
tokenizer, model, image_processor, context_len = load_pretrained_model(model_path, args.model_base, model_name,
File "/Bunny/bunny/model/builder.py", line 58, in load_pretrained_model
model = BunnyQwen2ForCausalLM.from_pretrained(model_base, low_cpu_mem_usage=True, config=lora_cfg_pretrained,
File "/opt/conda/lib/python3.9/site-packages/transformers/modeling_utils.py", line 3447, in from_pretrained
no_split_modules = model._get_no_split_modules(device_map)
File "/opt/conda/lib/python3.9/site-packages/transformers/modeling_utils.py", line 1769, in _get_no_split_modules
raise ValueError(
ValueError: SiglipVisionModel does not support `device_map='auto'`. To implement support, the model class needs to implement the `_no_split_modules` attribute. How should I solve it |
What is the merging command you use? |
python script/merge_lora_weights.py \
--model-path ./checkpoints-qwen1.5-1.8b/bunny-lora-qwen1.5-1.8b \
--model-base ./models/Qwen1.5-1.8B \
--model-type qwen1.5-1.8b \
--save-model-path ./models/model |
model config {
"_name_or_path": "./models/Qwen1.5-1.8B",
"architectures": [
"BunnyQwen2ForCausalLM"
],
"attention_dropout": 0.0,
"auto_map": {
"AutoConfig": "configuration_bunny_qwen2.BunnyQwen2Config",
"AutoModelForCausalLM": "modeling_bunny_qwen2.BunnyQwen2ForCausalLM"
},
"bos_token_id": 151643,
"eos_token_id": 151643,
"freeze_mm_mlp_adapter": false,
"hidden_act": "silu",
"hidden_size": 2048,
"image_aspect_ratio": "pad",
"initializer_range": 0.02,
"intermediate_size": 5504,
"max_position_embeddings": 32768,
"max_window_layers": 21,
"mm_hidden_size": 1152,
"mm_projector_lr": 2e-05,
"mm_projector_type": "mlp2x_gelu",
"mm_vision_tower": "./models/siglip-so400m-patch14-384",
"model_type": "bunny-qwen2",
"num_attention_heads": 16,
"num_hidden_layers": 24,
"num_key_value_heads": 16,
"rms_norm_eps": 1e-06,
"rope_theta": 1000000.0,
"sliding_window": 32768,
"tie_word_embeddings": false,
"tokenizer_model_max_length": 2048,
"tokenizer_padding_side": "right",
"torch_dtype": "float16",
"transformers_version": "4.39.1",
"tune_mm_mlp_adapter": false,
"use_cache": true,
"use_mm_proj": true,
"use_sliding_window": false,
"continuous_training":true,
"vocab_size": 151646
} train.sh #!/bin/bash
MODEL_TYPE=qwen1.5-1.8b
PRETRAIN_DIR=bunny-$MODEL_TYPE-pretrain
OUTPUT_DIR=bunny-lora-ct-$MODEL_TYPE
mkdir -p ./checkpoints-$MODEL_TYPE/$OUTPUT_DIR
deepspeed bunny/train/train.py \
--lora_enable True --lora_r 128 --lora_alpha 256 --mm_projector_lr 2e-5 \
--deepspeed ./script/deepspeed/zero3.json \
--model_name_or_path ./models/merged_model \
--model_type $MODEL_TYPE \
--version bunny \
--data_path ./data/Bunny.json \
--image_folder ./data/image \
--vision_tower ./models/siglip-so400m-patch14-384 \
--mm_projector_type mlp2x_gelu \
--image_aspect_ratio pad \
--group_by_modality_length False \
--bf16 True \
--output_dir ./checkpoints-$MODEL_TYPE/$OUTPUT_DIR \
--num_train_epochs 5 \
--per_device_train_batch_size 8 \
--per_device_eval_batch_size 4 \
--gradient_accumulation_steps 2 \
--evaluation_strategy "no" \
--save_strategy "steps" \
--save_steps 500 \
--save_total_limit 1 \
--learning_rate 2e-4 \
--weight_decay 0. \
--warmup_ratio 0.03 \
--lr_scheduler_type "cosine" \
--logging_steps 1 \
--tf32 True \
--model_max_length 2048 \
--gradient_checkpointing True \
--dataloader_num_workers 4 \
--lazy_preprocess True \
--report_to none | tee 2>&1 ./checkpoints-$MODEL_TYPE/$OUTPUT_DIR/log.txt
|
When I update Transformers to the latest version, there is an new error Traceback (most recent call last):
File "./Bunny/script/merge_lora_weights.py", line 26, in <module>
merge_lora(args)
File "./Bunny/script/merge_lora_weights.py", line 10, in merge_lora
tokenizer, model, image_processor, context_len = load_pretrained_model(model_path, args.model_base, model_name,
File "./Bunny/bunny/model/builder.py", line 58, in load_pretrained_model
model = BunnyQwen2ForCausalLM.from_pretrained(model_base, low_cpu_mem_usage=True, config=lora_cfg_pretrained,
File "/opt/conda/lib/python3.9/site-packages/transformers/modeling_utils.py", line 3754, in from_pretrained
) = cls._load_pretrained_model(
File "/opt/conda/lib/python3.9/site-packages/transformers/modeling_utils.py", line 4214, in _load_pretrained_model
new_error_msgs, offload_index, state_dict_index = _load_state_dict_into_meta_model(
File "/opt/conda/lib/python3.9/site-packages/transformers/modeling_utils.py", line 887, in _load_state_dict_into_meta_model
set_module_tensor_to_device(model, param_name, param_device, **set_module_kwargs)
File "/opt/conda/lib/python3.9/site-packages/accelerate/utils/modeling.py", line 348, in set_module_tensor_to_device
raise ValueError(
ValueError: Trying to set a tensor of shape torch.Size([151936, 2048]) in "weight" (which has shape torch.Size([151646, 2048])), this look incorrect. |
I know why the error was occured |
Great! And I realize that when evalutaing the final model (continuously trained), |
Ok, thank you very much for your patient answer |
Is there any answer to this question? |
I conducted an experiment: Then I used dataset B and prompt B to continuous fine-tuning model A to obtain model B This is also worse than directly fine-tuning multiple instructions Is there any trick that can guide you? Or is it that my approach is not appropriate? |
Hi I have the same issue but with different size because of the pad_token_id: How did you solve it? |
No, it is related to some other implementation that introduces the I managed to train the model with LoRA and now I want to merge the adapters back but I get the error above.. |
@basteran We didn't try to expand the vocabulary, so maybe we couldn't help you. |
What do you mean you didn't try to expand the vocabulary? I see these lines in your code. Aren't you adding the new Thanks for the help! |
@basteran |
Ok, I got it. So you add the Thank you very much for the help! Now I understand what's going on.. I am considering switching to your Bunny repository instead of LLaVA++ 😄 |
@basteran When training and running, Bunny uses an existing token So, I just pick up an existing token serving as the padding token without modifying the tokenizer a lot. |
There exists a complex and comprehensive influence of different kinds of data and the fraction of each. So it's hard to give a simple principle. The performance may be related to the knowledge area of each kind of data, the conflicts and cooperations. Whether to unfreeze the vision tower and the hype-parameters may also matter. From my own perspective, fine-tuning multiple instructions at once (e.g. Bunny-695K + your own data) may be better. |
Close the issue for now if there's no further discussions. Feel free to reopen it if there's any other questions. |
Thanks for your work
I would like to know which effect would be better between continuous fine-tuning and fine-tuning multiple instructions at once?
The text was updated successfully, but these errors were encountered: