This repository has been archived by the owner on Oct 12, 2023. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 472
Usage
hoshi-hiyouga edited this page Jul 15, 2023
·
18 revisions
class glmtuner.hparams.ModelArguments <source>
-
model_name_or_path (str, optional): Path to pretrained model or model identifier from huggingface.co/models. Default:
THUDM/chatglm-6b
-
config_name (str, optional): Pretrained config name or path if not the same as model_name. Default:
None
-
tokenizer_name (str, optional): Pretrained tokenizer name or path if not the same as model_name. Default:
None
-
cache_dir (str, optional): Where to store the pretrained models downloaded from huggingface.co. Default:
None
-
use_fast_tokenizer (bool, optional): Whether to use one of the fast tokenizer (backed by the tokenizers library) or not. Default:
True
-
model_revision (str, optional): The specific model version to use (can be a branch name, tag name or commit id). Default:
main
-
use_auth_token (str, optional): Will use the token generated when running
huggingface-cli login
. Default:False
-
quantization_bit (int, optional): The number of bits to quantize the model. Default:
None
-
quantization_type (str, optional): Quantization data type to use in int4 training. Default:
nf4
-
double_quantization (bool, optional): Whether to use double quantization in int4 training or not. Default:
True
-
checkpoint_dir (str, optional): Path to the directory containing the model checkpoints as well as the configurations. Default:
None
-
reward_model (str, optional): Path to the directory containing the checkpoints of the reward model. Default:
None
-
resume_lora_training (bool, optional): Whether to resume training from the last LoRA weights or create new weights after merging them. Default:
True
-
plot_loss (bool, optional): Whether to plot the training loss after fine-tuning or not. Default:
False
class glmtuner.hparams.DataArguments <source>
-
dataset (str, optional): The name of provided dataset(s) to use. Use commas to separate multiple datasets. Default:
alpaca_zh
-
dataset_dir (str, optional): The name of the folder containing datasets. Default:
data
-
split (str, optional): Which dataset split to use for training and evaluation. Default:
train
-
overwrite_cache (bool, optional): Overwrite the cached training and evaluation sets. Default:
False
-
preprocessing_num_workers (int, optional): The number of processes to use for the preprocessing. Default:
None
-
max_source_length (int, optional): The maximum total input sequence length after tokenization. Default:
512
-
max_target_length (int, optional): The maximum total output sequence length after tokenization. Default:
512
-
max_samples (int, optional): For debugging purposes, truncate the number of examples for each dataset. Default:
None
-
eval_num_beams (int, optional): Number of beams to use for evaluation. This argument will be passed to
model.generate
. Default:None
-
ignore_pad_token_for_loss (bool, optional): Whether to ignore the tokens corresponding to padded labels in the loss computation or not. Default:
True
-
source_prefix (str, optional): A prefix to add before every source text (useful for T5 models). Default:
None
-
dev_ratio (float, optional): Proportion of the dataset to include in the development set, should be between 0.0 and 1.0. Default:
0
class glmtuner.hparams.FinetuningArguments <source>
-
finetuning_type (str, optional): Which fine-tuning method to use for training. Default:
lora
-
num_layer_trainable (int, optional): Number of trainable layers for Freeze fine-tuning. Default:
3
-
name_module_trainable (int, optional): Name of trainable modules for Freeze fine-tuning. Default:
mlp
-
pre_seq_len (int, optional): Number of prefix tokens to use for P-tuning v2. Default:
64
-
prefix_projection (bool, optional): Whether to add a project layer for the prefix in P-tuning v2 or not. Default:
False
-
lora_rank (int, optional): The intrinsic dimension for LoRA fine-tuning. Default:
8
-
lora_alpha (float, optional): The scale factor for LoRA fine-tuning. (similar with the learning rate) Default:
32.0
-
lora_dropout (float, optional): Dropout rate for the LoRA fine-tuning. Default:
0.1
-
lora_target (str, optional): The name(s) of target modules to apply LoRA. Use commas to separate multiple modules. Default:
query_key_value
class transformers.Seq2SeqTrainingArguments <source>
We only list some important arguments, for a full list, please refer to HuggingFace Docs.
- output_dir (str): The output directory where the model predictions and checkpoints will be written.
-
overwrite_output_dir (bool, optional): If True, overwrite the content of the output directory. Use this to continue training if output_dir points to a checkpoint directory. Default:
False
-
do_train (bool, optional): Whether to run training or not. Default:
False
-
do_eval (bool, optional): Whether to run evaluation or not. Default:
False
-
do_predict (bool, optional): Whether to run predictions or not. Default:
False
-
per_device_train_batch_size (int, optional): The batch size per GPU/TPU core/CPU for training. Default:
8
-
per_device_eval_batch_size (int, optional): The batch size per GPU/TPU core/CPU for evaluation or prediction. Default:
8
-
gradient_accumulation_steps (int, optional): Number of updates steps to accumulate the gradients for, before performing a backward/update pass. Default:
1
-
learning_rate (float, optional): The initial learning rate for AdamW optimizer. Default:
5e-5
-
weight_decay (float, optional): The weight decay to apply (if not zero) to all layers except all bias and LayerNorm weights in AdamW optimizer. Default:
0.0
-
max_grad_norm (float, optional): Maximum gradient norm (for gradient clipping). Default:
1.0
-
num_train_epochs (float, optional): Total number of training epochs to perform (if not an integer, will perform the decimal part percents of the last epoch before stopping training). Default:
3.0
-
logging_steps (int, optional): Number of update steps between two logs. Default:
500
-
save_steps (int, optional): Number of updates steps before two checkpoint saves. Default:
500
-
no_cuda (bool, optional): Whether to not use CUDA even when it is available or not. Default:
False
-
fp16 (bool, optional): Whether to use fp16 16-bit (mixed) precision training instead of 32-bit training. Default:
False
-
predict_with_generate (bool, optional): Whether to use generate to calculate generative metrics (ROUGE, BLEU). Default:
False