-
Notifications
You must be signed in to change notification settings - Fork 69
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tokenization mismatch #75
Comments
I believe so. Here is my
Here is my pretrained
|
There was a bug that HF Llama-3 wouldn't prepend Please check whether your model weights are up-to-date. |
Still didn't fix it. I have deleted cached weights and check that the |
We notice that Llama-3 changes |
In theory, can I also use this model? https://huggingface.co/shenzhi-wang/Llama3-8B-Chinese-Chat
I received an error while training again
Meta-Llama-3-8B-Instruct is also the same error |
Try to edit here other than editing the configuration of base model. |
I changed it to this
training is ok
merged sh
|
I edited the config as you said. The training was fine but during inference i got
for every sample. The output tokens look like this
|
@Gary2018X It seems no relation with Bunny. Please try to Google. |
@swhoosh What about the loss curve? |
I trained for only 20 steps just to test it out first. The loss seemed fine when I trained full epochs yesterday where I edited the model's config instead of the bunny's like what you recommended. However, they still had the same problem. FYI, I was able to get the expected result from |
Maybe there exists a huge gap between medical images and knowledge and regular images and knowledge. |
Well, Phi-2 did actually work during our testing and I was able to get llama3 to work before the recent config update. Can you try reproducing the finetuning result on your end to ensure that the model is behaving correctly? |
After being consistent with this change, my problem was resolved and I was able to normally. |
@Gary2018X are you able to inference? My inference still produce the same result as
I had check my llama3 version / use the latest dev branch. |
It may be related to your base model |
Although I can infer normally, the result is not as good as qwen1.8b yet |
@swhoosh @Gary2018X We would keep using |
Close the issue for now if there's no further discussions. Feel free to reopen it if there's any other questions. |
@Isaachhh Why should keep using |
Actually, the performance using |
@Isaachhh So, we just need to ignore such as But the target value will all be set to IGNORE_INDEX here, in theory, this would result in no supervised information (loss=0), Why performace is better? |
It shouldn't be any like |
Meta-Llama-3-8B-Insturct, here is my #!/bin/bash
MODEL_TYPE=llama3-8b
# params
MODEL_PATH=/data/models/meta-llama/Meta-Llama-3-8B-Instruct
CONV_VERSION=llama
DATA_PATH=/data/datasets/images/bunny-v1_1/finetune/bunny_allava_1.3m.json # recipe-1
BATCH_SIZE=4
GRADIENT_ACCUMULATION_STEPS=4
LORA_R=256
LORA_ALPHA=256
MM_PROJECTOR_LR=1e-5
LR=1e-4
MODEL_MAX_LENGTH=4096
PRETRAIN_DIR=bunny-$MODEL_TYPE-pretrain
OUTPUT_DIR=bunny-lora-$MODEL_TYPE-recipe-1
mkdir -p ./checkpoints-$MODEL_TYPE/$OUTPUT_DIR
deepspeed bunny/train/train.py \
--lora_enable True --lora_r $LORA_R --lora_alpha $LORA_ALPHA --mm_projector_lr $MM_PROJECTOR_LR \
--deepspeed ./script/deepspeed/zero3.json \
--model_name_or_path $MODEL_PATH \
--model_type $MODEL_TYPE \
--version $CONV_VERSION \
--data_path $DATA_PATH \
--image_folder /data/datasets/images/bunny-v1_1/finetune/images \
--vision_tower /data/models/google/siglip-so400m-patch14-384 \
--use_s2 True \
--pretrain_mm_mlp_adapter ./checkpoints-pretrain/$PRETRAIN_DIR/mm_projector.bin \
--mm_projector_type mlp2x_gelu \
--image_aspect_ratio pad \
--group_by_modality_length False \
--bf16 True \
--output_dir ./checkpoints-$MODEL_TYPE/$OUTPUT_DIR \
--num_train_epochs 1 \
--per_device_train_batch_size $BATCH_SIZE \
--per_device_eval_batch_size 4 \
--gradient_accumulation_steps $GRADIENT_ACCUMULATION_STEPS \
--evaluation_strategy "no" \
--save_strategy "steps" \
--save_steps 500 \
--save_total_limit 1 \
--learning_rate $LR \
--weight_decay 0. \
--warmup_ratio 0.03 \
--lr_scheduler_type "cosine" \
--logging_steps 1 \
--tf32 True \
--model_max_length $MODEL_MAX_LENGTH \
--gradient_checkpointing True \
--dataloader_num_workers 4 \
--lazy_preprocess True \
--report_to none | tee 2>&1 ./checkpoints-$MODEL_TYPE/$OUTPUT_DIR/log.txt |
What's |
|
I tested. The training for meta-llama/Meta-Llama-3-8B-Instruct is fine with current code. You may check your code or the version of your LLM. |
So, whether the two value are equal depends on |
I suggest you print the |
OK, given a conversations example (I don't know why there is no ["A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions. USER: <image>\nDo you see any letters that are white?\nAnswer the question using a single word or phrase. ASSISTANT: No<|end_of_text|>USER: Is the tire large and black? ASSISTANT: Yes<|end_of_text|>"] input_ids (init targets): tensor([[ 32, 6369, 1990, 264, 22999, 1217, 323, 459, 21075,
11478, 18328, 13, 578, 18328, 6835, 11190, 11, 11944,
11, 323, 48887, 11503, 311, 279, 1217, 596, 4860,
13, 14194, 25, 220, -200, 198, 5519, 499, 1518,
904, 12197, 430, 527, 4251, 5380, 16533, 279, 3488,
1701, 264, 3254, 3492, 477, 17571, 13, 36660, 3931,
2891, 25, 2360, 128001, 6584, 25, 2209, 279, 28387,
3544, 323, 3776, 30, 36660, 3931, 2891, 25, 7566,
128001]]) ignore the first token: tensor([[ -100, 6369, 1990, 264, 22999, 1217, 323, 459, 21075,
11478, 18328, 13, 578, 18328, 6835, 11190, 11, 11944,
11, 323, 48887, 11503, 311, 279, 1217, 596, 4860,
13, 14194, 25, 220, -200, 198, 5519, 499, 1518,
904, 12197, 430, 527, 4251, 5380, 16533, 279, 3488,
1701, 264, 3254, 3492, 477, 17571, 13, 36660, 3931,
2891, 25, 2360, 128001, 6584, 25, 2209, 279, 28387,
3544, 323, 3776, 30, 36660, 3931, 2891, 25, 7566,
128001]]) round 1: tensor([ -100, -100, -100, -100, -100, -100, -100, -100, -100,
-100, -100, -100, -100, -100, -100, -100, -100, -100,
-100, -100, -100, -100, -100, -100, -100, -100, -100,
-100, -100, -100, -100, -100, -100, -100, -100, -100,
-100, -100, -100, -100, -100, -100, -100, -100, -100,
-100, -100, -100, -100, -100, -100, -100, -100, -100,
-100, -100, 2360, 128001, 6584, 25, 2209, 279, 28387,
3544, 323, 3776, 30, 36660, 3931, 2891, 25, 7566,
128001]) round 2: tensor([ -100, -100, -100, -100, -100, -100, -100, -100, -100,
-100, -100, -100, -100, -100, -100, -100, -100, -100,
-100, -100, -100, -100, -100, -100, -100, -100, -100,
-100, -100, -100, -100, -100, -100, -100, -100, -100,
-100, -100, -100, -100, -100, -100, -100, -100, -100,
-100, -100, -100, -100, -100, -100, -100, -100, -100,
-100, -100, 2360, 128001, -100, -100, -100, -100, -100,
-100, -100, -100, -100, -100, -100, -100, 25, 7566,
128001])
That's all. |
So the problem is clear. The tokenizer didn't prepend a Did you clone the latest model from HF? |
Maybe not, I will re-download and verify it. |
@Isaachhh Yes, confirmed that this is only a problem with the LLaMA version, just update |
I tried finetuning my model after stage 1. Apparently, there are tokenization mismatches and the loss is 0.
Do you have any ideas what might be the problem.
Thanks!
sh finetune_full.sh
The text was updated successfully, but these errors were encountered: