Skip to content
This repository has been archived by the owner on Oct 12, 2023. It is now read-only.
hoshi-hiyouga edited this page Jul 15, 2023 · 18 revisions

Arguments

class glmtuner.hparams.ModelArguments <source>

  • model_name_or_path (str, optional): Path to pretrained model or model identifier from huggingface.co/models. Default: THUDM/chatglm-6b
  • config_name (str, optional): Pretrained config name or path if not the same as model_name. Default: None
  • tokenizer_name (str, optional): Pretrained tokenizer name or path if not the same as model_name. Default: None
  • cache_dir (str, optional): Where to store the pretrained models downloaded from huggingface.co. Default: None
  • use_fast_tokenizer (bool, optional): Whether to use one of the fast tokenizer (backed by the tokenizers library) or not. Default: True
  • model_revision (str, optional): The specific model version to use (can be a branch name, tag name or commit id). Default: main
  • use_auth_token (str, optional): Will use the token generated when running huggingface-cli login. Default: False
  • quantization_bit (int, optional): The number of bits to quantize the model. Default: None
  • quantization_type (str, optional): Quantization data type to use in int4 training. Default: nf4
  • double_quantization (bool, optional): Whether to use double quantization in int4 training or not. Default: True
  • checkpoint_dir (str, optional): Path to the directory containing the model checkpoints as well as the configurations. Default: None
  • reward_model (str, optional): Path to the directory containing the checkpoints of the reward model. Default: None
  • resume_lora_training (bool, optional): Whether to resume training from the last LoRA weights or create new weights after merging them. Default: True
  • plot_loss (bool, optional): Whether to plot the training loss after fine-tuning or not. Default: False

class glmtuner.hparams.DataArguments <source>

  • dataset (str, optional): The name of provided dataset(s) to use. Use commas to separate multiple datasets. Default: alpaca_zh
  • dataset_dir (str, optional): The name of the folder containing datasets. Default: data
  • split (str, optional): Which dataset split to use for training and evaluation. Default: train
  • overwrite_cache (bool, optional): Overwrite the cached training and evaluation sets. Default: False
  • preprocessing_num_workers (int, optional): The number of processes to use for the preprocessing. Default: None
  • max_source_length (int, optional): The maximum total input sequence length after tokenization. Default: 512
  • max_target_length (int, optional): The maximum total output sequence length after tokenization. Default: 512
  • max_samples (int, optional): For debugging purposes, truncate the number of examples for each dataset. Default: None
  • eval_num_beams (int, optional): Number of beams to use for evaluation. This argument will be passed to model.generate. Default: None
  • ignore_pad_token_for_loss (bool, optional): Whether to ignore the tokens corresponding to padded labels in the loss computation or not. Default: True
  • source_prefix (str, optional): A prefix to add before every source text (useful for T5 models). Default: None
  • dev_ratio (float, optional): Proportion of the dataset to include in the development set, should be between 0.0 and 1.0. Default: 0

class glmtuner.hparams.FinetuningArguments <source>

  • finetuning_type (str, optional): Which fine-tuning method to use for training. Default: lora
  • num_layer_trainable (int, optional): Number of trainable layers for Freeze fine-tuning. Default: 3
  • name_module_trainable (int, optional): Name of trainable modules for Freeze fine-tuning. Default: mlp
  • pre_seq_len (int, optional): Number of prefix tokens to use for P-tuning v2. Default: 64
  • prefix_projection (bool, optional): Whether to add a project layer for the prefix in P-tuning v2 or not. Default: False
  • lora_rank (int, optional): The intrinsic dimension for LoRA fine-tuning. Default: 8
  • lora_alpha (float, optional): The scale factor for LoRA fine-tuning. (similar with the learning rate) Default: 32.0
  • lora_dropout (float, optional): Dropout rate for the LoRA fine-tuning. Default: 0.1
  • lora_target (str, optional): The name(s) of target modules to apply LoRA. Use commas to separate multiple modules. Default: query_key_value

class transformers.Seq2SeqTrainingArguments <source>

We only list some important arguments, for a full list, please refer to HuggingFace Docs.

  • output_dir (str): The output directory where the model predictions and checkpoints will be written.
  • overwrite_output_dir (bool, optional): If True, overwrite the content of the output directory. Use this to continue training if output_dir points to a checkpoint directory. Default: False
  • do_train (bool, optional): Whether to run training or not. Default: False
  • do_eval (bool, optional): Whether to run evaluation or not. Default: False
  • do_predict (bool, optional): Whether to run predictions or not. Default: False
  • per_device_train_batch_size (int, optional): The batch size per GPU/TPU core/CPU for training. Default: 8
  • per_device_eval_batch_size (int, optional): The batch size per GPU/TPU core/CPU for evaluation or prediction. Default: 8
  • gradient_accumulation_steps (int, optional): Number of updates steps to accumulate the gradients for, before performing a backward/update pass. Default: 1
  • learning_rate (float, optional): The initial learning rate for AdamW optimizer. Default: 5e-5
  • weight_decay (float, optional): The weight decay to apply (if not zero) to all layers except all bias and LayerNorm weights in AdamW optimizer. Default: 0.0
  • max_grad_norm (float, optional): Maximum gradient norm (for gradient clipping). Default: 1.0
  • num_train_epochs (float, optional): Total number of training epochs to perform (if not an integer, will perform the decimal part percents of the last epoch before stopping training). Default: 3.0
  • logging_steps (int, optional): Number of update steps between two logs. Default: 500
  • save_steps (int, optional): Number of updates steps before two checkpoint saves. Default: 500
  • no_cuda (bool, optional): Whether to not use CUDA even when it is available or not. Default: False
  • fp16 (bool, optional): Whether to use fp16 16-bit (mixed) precision training instead of 32-bit training. Default: False
  • predict_with_generate (bool, optional): Whether to use generate to calculate generative metrics (ROUGE, BLEU). Default: False