🌄 Revisiting Zeroth-Order Optimization for Memory-Efficient LLM Fine-Tuning: A Benchmark

Official code for the paper "Revisiting Zeroth-Order Optimization for Memory-Efficient LLM Fine-Tuning: A Benchmark ".

Authors (* Equal Contribution): Yihua Zhang*, Pingzhi Li*, Junyuan Hong*, Jiaxiang Li*, Yimeng Zhang, Wenqing Zheng, Pin-Yu Chen, Jason D. Lee, Wotao Yin, Mingyi Hong, Zhangyang Wang, Sijia Liu, and Tianlong Chen

1) Overview

This repo contains the source code and reproducing guide of ZO-LLM. This research endeavor is designed to help researchers better understand the capabilities, limitations and principles associated with the BP-free, zeroth-order (ZO) optimization as a solution for reducing memory costs during Large Language Model (LLM) fine-tuning. Our study unveils previously overlooked optimization principles, highlighting the importance of task alignment, the role of the forward gradient method, and the balance between algorithm complexity and fine-tuning performance.

This project is organized around the following scopes, including:

Five LLM families: Roberta, OPT, LLaMA, Vicuna, and Mistral.
Three task complexities: binary classification, question-answering, and commonsense reasoning.
Four fine-tuning schemes: full fine-tuning, LoRA, prefix tuning, and prompt tuning.
Six BP-free optimization methods: ZO-SGD, ZO-SGD-Sign, ZO-SGD-MMT, ZO-SGD-Cons, ZO-Adam, and forward gradient.
Three novel enhancements to ZO optimization: block-wise descent, hybrid training, and gradient sparsity.

2) Project Structure

This project is structured around the hyperparameter sweeping for various tasks & models & tuning schemes & optimization methods. All optimization methods are implemented in zo-bench/trainer.py. Task configurations are defined in zo-bench/tasks.py and zo-bench/templates.py. The main entry point is zo-bench/run.py.

.
├── zo-bench
│   ├── modeling_mistral
│   │   ├─── __init__.py
│   │   ├── configuration_mistral.py
│   │   ├── modleing_mistral.py
│   ├── modeling_llama.py
│   ├── modeling_opt.py
│   ├── modeling_roberta.py
│   ├── prefix_tuning.py
│   ├── prompt_tuning.py
│   ├── run.py
│   ├── tasks.py
│   ├── templates.py
│   ├── test_fake_text_memory.py
│   ├── trainer.py
│   ├── utils.py
│   ├── sweep
│   │   ├── Copa_llama-7b
│   │   │   ├── adam
│   │   │   │   ├── adam_copa_ft.yml
│   │   │   │   ├── adam_copa_lora.yml
│   │   │   │   ├── adam_copa_prefix.yml
│   │   │   │   ├── adam_copa_prompt.yml
│   │   │   ├── forward_grad
│   │   │   │   ├── forward_grad_copa_ft.yml
│   │   │   │   ├── forward_grad_copa_lora.yml
│   │   │   │   ├── forward_grad_copa_prefix.yml
│   │   │   │   ├── forward_grad_copa_prompt.yml
│   │   │   ├── sgd
│   │   │   │   ├── sgd_copa_ft.yml
│   │   │   │   ├── sgd_copa_lora.yml
│   │   │   │   ├── sgd_copa_prefix.yml
│   │   │   │   ├── sgd_copa_prompt.yml
│   │   │   ├── sign_sgd
│   │   │   │   ├── sign_sgd_copa_ft.yml
│   │   │   │   ├── sign_sgd_copa_lora.yml
│   │   │   │   ├── sign_sgd_copa_prefix.yml
│   │   │   │   ├── sign_sgd_copa_prompt.yml
│   │   │   ├── zo_adam
│   │   │   │   ├── zo_adam_copa_ft.yml
│   │   │   │   ├── zo_adam_copa_lora.yml
│   │   │   │   ├── zo_adam_copa_prefix.yml
│   │   │   │   ├── zo_adam_copa_prompt.yml
│   │   │   ├── zo_sgd
│   │   │   │   ├── zo_sgd_copa_ft.yml
│   │   │   │   ├── zo_sgd_copa_lora.yml
│   │   │   │   ├── zo_sgd_copa_prefix.yml
│   │   │   │   ├── zo_sgd_copa_prompt.yml
│   │   │   ├── zo_sgd_conserv
│   │   │   │   ├── zo_sgd_conserv_copa_ft.yml
│   │   │   │   ├── zo_sgd_conserv_copa_lora.yml
│   │   │   │   ├── zo_sgd_conserv_copa_prefix.yml
│   │   │   │   ├── zo_sgd_conserv_copa_prompt.yml
│   │   │   ├── zo_sgd_momen
│   │   │   │   ├── zo_sgd_momen_copa_ft.yml
│   │   │   │   ├── zo_sgd_momen_copa_lora.yml
│   │   │   │   ├── zo_sgd_momen_copa_prefix.yml
│   │   │   │   ├── zo_sgd_momen_copa_prompt.yml
│   │   ├── Copa_llama-13b
│   │   │   ├── ...
│   │   ├── Copa_mistral
│   │   │   ├── ...
│   │   ├── Copa_opt-13b
│   │   │   ├── ...
│   │   ├── Copa_vicuna
│   │   │   ├── ...
│   │   ├── SST2_opt-1.3b
│   │   │   ├── ...
│   │   ├── WinoGrande_llama-7b
│   │   │   ├── ...
│   │   ├── WinoGrande_llama-13b
│   │   │   ├── ...
│   │   ├── WinoGrande_mistral
│   │   │   ├── ...
│   │   ├── WinoGrande_opt-13b
│   │   │   ├── ...
│   │   ├── WinoGrande_vicuna
│   │   │   ├── ...
├── environment.yml

3) Getting Started

All you need is:

conda create -n zollm python=3.10
conda activate zollm
pip install -r requirements.txt

4) Reproducing Results

We provide detailed hyperparameter settings in sweeps, where the sweep configuration for tuning a MODEL on TASK under SCHEME with OPTIMIZER is organized as zo-bench/sweeps/TASK_MODEL/OPTIMIZER/SCHEME.yml.

An example use of sweep for full fine-tuning LLaMA-7B with ZO-SGD on the COPA task is as follows:

~> wandb sweep zo-bench/sweeps/Copa_llama-7b/zo_sgd/zo_sgd_copa_ft.yml
wandb: Creating sweep from: zo-bench/sweeps/Copa_llama-7b/zo_sgd/zo_sgd_copa_ft.yml
wandb: Created sweep with ID: <ID>
wandb: View sweep at: https://wandb.ai/<unique ID>
wandb: Run sweep agent with: wandb agent <unique ID>
~> wandb agent <unique ID>

For the Extended Study, please check the following code:

Block-wise ZO: Add the argument --module_wise_perturbation=True to the command line. Note that temporarily we only support OPT family models. For example:

python run.py --model_name=facebook/opt-1.3b --task_name=SST2 --output_dir=result/SST2-ft-$TAG --num_train_epochs=5 \
--per_device_train_batch_size=16 --load_best_model_at_end --evaluation_strategy=steps --save_strategy=steps \
--save_total_limit=1 --eval_steps=1000 --max_steps=20000 --logging_steps=10 --num_eval=1000 --num_train=1000 \
--num_dev=500 --train_as_classification --perturbation_mode=two_side --trainer=zo_sgd --train_set_seed=0 \
--lr_scheduler_type=constant --save_steps=1000 --load_float16 --learning_rate=1e-8 --zo_eps=0.001 --momentum=0.9 
--weight_decay=0 --module_wise_perturbation=True

Gradient Pruning: The corresponding arguments are sparse_gradient_group, gradient_sparsity and sparse_gradient_resample_steps. The options of them can be found in the correpsonding comments in run.py. An example sweep configuration is in zo-bench/sweeps/SST2_opt-1.3b/zo_sgd_sparse_grad/zo_sgd_sparse_grad_cls_ft.yml.

5) Citation

@misc{zhang2024revisiting,
      title={Revisiting Zeroth-Order Optimization for Memory-Efficient LLM Fine-Tuning: A Benchmark}, 
      author={Yihua Zhang and Pingzhi Li and Junyuan Hong and Jiaxiang Li and Yimeng Zhang and Wenqing Zheng and Pin-Yu Chen and Jason D. Lee and Wotao Yin and Mingyi Hong and Zhangyang Wang and Sijia Liu and Tianlong Chen},
      year={2024},
      eprint={2402.11592},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
zo-bench		zo-bench
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🌄 Revisiting Zeroth-Order Optimization for Memory-Efficient LLM Fine-Tuning: A Benchmark

Table of Contents

1) Overview

2) Project Structure

3) Getting Started

4) Reproducing Results

5) Citation

About

Languages

License

ZO-Bench/ZO-LLM

Folders and files

Latest commit

History

Repository files navigation

🌄 Revisiting Zeroth-Order Optimization for Memory-Efficient LLM Fine-Tuning: A Benchmark

Table of Contents

1) Overview

2) Project Structure

3) Getting Started

4) Reproducing Results

5) Citation

About

Topics

Resources

License

Stars

Watchers

Forks

Languages