Usage

Fine-tuning

python src/finetune.py

model_name_or_path (str, optional): Path to pretrained model or model identifier from huggingface.co/models. Default: CHATGLM_REPO_NAME
config_name (str, optional): Pretrained config name or path if not the same as model_name. Default: None
tokenizer_name (str, optional): Pretrained tokenizer name or path if not the same as model_name. Default: None
cache_dir (str, optional): Where to store the pretrained models downloaded from huggingface.co. Default: None
use_fast_tokenizer (bool, optional): Whether to use one of the fast tokenizer (backed by the tokenizers library) or not. Default: True
model_revision (str, optional): The specific model version to use (can be a branch name, tag name or commit id). Default: CHATGLM_LASTEST_HASH
use_auth_token (str, optional): Will use the token generated when running huggingface-cli login. Default: False
resize_position_embeddings (bool, optional): Whether to resize the position embeddings if max_source_length exceeds or not. Default: False
quantization_bit (int, optional): The number of bits to quantize the model. Default: None
checkpoint_dir (str, optional): Path to the directory containing the model checkpoints as well as the configurations. Default: None

dataset (str, optional): The name of provided dataset(s) to use. Use comma to separate multiple datasets. Default: alpaca_zh
dataset_dir (str, optional): The name of the folder containing datasets. Default: data
split (str, optional): Which dataset split to use for training and evaluation. Default: train
overwrite_cache (bool, optional): Overwrite the cached training and evaluation sets. Default: False
preprocessing_num_workers (int, optional): The number of processes to use for the preprocessing. Default: None
max_source_length (int, optional): The maximum total input sequence length after tokenization. Default: 512
max_target_length (int, optional): The maximum total output sequence length after tokenization. Default: 512
pad_to_max_length (bool, optional): Whether to pad all samples to model maximum sentence length or not. Default: False
max_train_samples (int, optional): For debugging purposes, truncate the number of training examples for each dataset. Default: None
max_eval_samples (int, optional): For debugging purposes, truncate the number of evaluation examples for each dataset. Default: None
num_beams (int, optional): Number of beams to use for evaluation. This argument will be passed to model.generate. Default: None
ignore_pad_token_for_loss (bool, optional): Whether to ignore the tokens corresponding to padded labels in the loss computation or not. Default: True

finetuning_type: Which fine-tuning method to use for training. Default: lora
w