sft_scripts_en

Instruction Fine-Tuning Script

⚠️Important Note⚠️

This code is only applicable to a specific PEFT version. Please install PEFT with commit id 13e53fc from the source code before running the script.
If you use other versions of PEFT or modify part of the training parameter settings (like not using deepspeed), we cannot guarantee that the model can be trained normally.
Before running, ensure to pull the latest version of the repository: git pull

Training Steps

Enter the scripts/training directory of the project and run bash run_sft.sh to fine-tune the instructions, using a single card by default. Users should modify the script and specify related parameters before running, the parameter values in the script are for debugging reference only. The content of run_sft.sh is as follows:

######## Parameters ########
lr=1e-4
lora_rank=64
lora_alpha=128
lora_trainable="q_proj,v_proj,k_proj,o_proj,gate_proj,down_proj,up_proj"
modules_to_save="embed_tokens,lm_head"
lora_dropout=0.05

pretrained_model=path/to/hf/llama-2/or/merged/llama-2/dir/or/model_id
chinese_tokenizer_path=path/to/chinese/llama-2/tokenizer/dir
dataset_dir=path/to/sft/data/dir
per_device_train_batch_size=1
per_device_eval_batch_size=1
training_steps=100
gradient_accumulation_steps=1
output_dir=output_dir
peft_model=path/to/peft/model/dir
validation_file=validation_file_name
max_seq_length=512

deepspeed_config_file=ds_zero2_no_offload.json

######## Launch command ########
torchrun --nnodes 1 --nproc_per_node 1 run_clm_sft_with_peft.py \
    --deepspeed ${deepspeed_config_file} \
    --model_name_or_path ${pretrained_model} \
    --tokenizer_name_or_path ${chinese_tokenizer_path} \
    --dataset_dir ${dataset_dir} \
    --per_device_train_batch_size ${per_device_train_batch_size} \
    --per_device_eval_batch_size ${per_device_eval_batch_size} \
    --do_train \
    --do_eval \
    --seed $RANDOM \
    --fp16 \
    --max_steps ${training_steps} \
    --lr_scheduler_type cosine \
    --learning_rate ${lr} \
    --warmup_ratio 0.03 \
    --weight_decay 0 \
    --logging_strategy steps \
    --logging_steps 10 \
    --save_strategy steps \
    --save_total_limit 3 \
    --evaluation_strategy steps \
    --eval_steps 250 \
    --save_steps 500 \
    --gradient_accumulation_steps ${gradient_accumulation_steps} \
    --preprocessing_num_workers 8 \
    --max_seq_length ${max_seq_length} \
    --output_dir ${output_dir} \
    --overwrite_output_dir \
    --ddp_timeout 30000 \
    --logging_first_step True \
    --lora_rank ${lora_rank} \
    --lora_alpha ${lora_alpha} \
    --trainable ${lora_trainable} \
    --modules_to_save ${modules_to_save} \
    --lora_dropout ${lora_dropout} \
    --torch_dtype float16 \
    --save_safetensors False \
    --validation_file ${validation_file} \
    --peft_path ${peft_model} \
    --gradient_checkpointing \
    --ddp_find_unused_parameters False

Some parameters are self-explanatory. Partial parameter explanations are as follows:

--tokenizer_name_or_path: Directory containing the Chinese-LLaMA-2 tokenizer. ⚠️ In this project, the LLaMA-2 model and Alpaca-2 model use the same tokenizer, they are not distinguished anymore.
--dataset_dir: Directory containing the instruction fine-tuning data, including one or more instruction fine-tuning data files in the Stanford Alpaca format ending with json.
--validation_file: The single instruction fine-tuning file used for the validation set, also in the Stanford Alpaca format and ending with json.
--flash_attn: FlashAttention-2 training enabled
--load_in_kbits: The selectable options are 16/8/4, which means using fp16 or 8-bit/4-bit quantization for model training. The default is fp16 training.

The other listed training-related hyperparameters (especially the learning rate, and parameters related to the total batch size) are for reference only. Please configure them according to the data situation and hardware conditions when actually using.

The Stanford Alpaca format is as follows:

[
  {"instruction" : ...,
   "input" : ...,
   "output" : ...},
  ...
]

Supported Training Modes

The script supports the following training modes. Unsupported modes are not listed in the table, you will need to debug if you want to modify them.

Model	model_name_or_path	peft_path	lora params
Fine-tuning instructions based on Chinese-LLaMA-2 LoRA	Original HF format of LLaMA-2	Chinese-LLaMA-2 LoRA	No need to specify
Fine-tuning instructions based on Chinese-Alpaca-2 LoRA	Original HF format of LLaMA-2	Chinese-Alpaca-2 LoRA	No need to specify
Training new instruction fine-tuning LoRA weights based on Chinese-LLaMA-2	Complete (merged after Chinese-LLaMA-2-LoRA) HF format Chinese-LLaMA-2 model	Do not provide this parameter, and delete `--peft_path` from the script	Need to set `--lora_rank`, `--lora_alpha`, `--lora_dropout`, `--trainable`, and `--modules_to_save` parameters
Training new instruction fine-tuning LoRA weights based on Chinese-Alpaca-2	Complete (merged after Chinese-Alapca-2-LoRA) HF format Chinese-Alpaca-2 model	Do not provide this parameter, and delete `--peft_path` from the script	Need to set `--lora_rank`, `--lora_alpha`, `--lora_dropout`, `--trainable`, and `--modules_to_save` parameters

VRAM Usage

If your machine's memory is tight, you can delete --modules_to_save ${modules_to_save} \ from the script, i.e., don't train embed_tokens and lm_head (these two parts have a larger parameter volume), only train the LoRA parameters.
- If you are fine-tuning based on existing LoRA, you need to modify the adapter_config.json file under peft_path, change to "modules_to_save": null
Reducing max_seq_length can also lower the memory usage during training, for example, you can set block_size to 256.

Use Multiple Machines and Cards

Please refer to the following launch method:

torchrun \
  --nnodes ${num_nodes}

中文文档

模型合并与转换
- 在线模型合并与转换（Colab）
- 手动模型合并与转换
模型量化、推理、部署
效果与评测
训练脚本
- 预训练脚本
- 指令精调脚本
基于人类反馈的强化学习
- 奖励模型
- 强化学习
常见问题

English Docs

Model Reconstruction
- Online Conversion (Colab)
- Manual Conversion
Model Quantization, Inference and Deployment
System Performance
Training Scripts
- Pre-training Scripts
- Instruction Fine-tuning Scripts
Reinforcement Learning from Human Feedback
- Reward Modeling
- Reinforcement Learning
FAQ

Provide feedback

Saved searches

Use saved searches to filter your results more quickly