Adapter, first introduced for the LLaMA model as LLaMA-Adapter, is a form of prefix-tuning that prepends a learnable adaption-prompt to the inputs of the attention blocks in an LLM. In total, there are only ~500M parameters to update during finetuning in LLaMA, which significantly reduces the memory footprint and speeds up training.
We are able to demonstrate instruction-finetuning Lit-Parrot StableLM 3B on the Alpaca dataset on a single GTX 3060 GPU. If using 8 GPUs, finetuning can be completed in under 1 hour.
If you are new to Adapter and are interested to learn more about how it works before proceeding with the finetuning guide below, you might find our article Understanding Parameter-Efficient Finetuning of Large Language Models: From Prefix Tuning to LLaMA-Adapters helpful.
The steps here only need to be done once:
- Follow the instructions in the README to install the dependencies.
- Download and convert the weights following our guide.
- If you want to utilize more than one GPU, you should
pip install deepspeed
. - Download the data and generate the Alpaca instruction tuning dataset:
python scripts/prepare_alpaca.py --checkpoint_dir checkpoints/stabilityai/stablelm-base-alpha-3b
python finetune_adapter.py --checkpoint_dir checkpoints/stabilityai/stablelm-base-alpha-3b
The finetuning requires at least one GPU with ~12 GB memory.
You can speed up training by setting the devices
variable in the script to utilize more GPUs if available.
Depending on the available GPU memory, you can also tune the micro_batch_size
parameter to utilize the GPU efficiently.
For example, the following settings will let you finetune the model in under 1 hour using DeepSpeed Zero-2:
devices = 4
micro_batch_size = 4
This script will save checkpoints periodically to the out_dir
directory. If you are finetuning different models or on your own dataset, you can specify an output directory with your preferred name:
python finetune_adapter.py --out_dir out/adapter/my-model-finetuned
You can test the finetuned model with your own instructions by running:
python generate_adapter.py \
--prompt "Recommend a movie to watch on the weekend." \
--checkpoint_dir checkpoints/stabilityai/stablelm-base-alpha-3b
Output:
A good movie to watch on the weekend would be The Lion King, since it's a classic family film that everyone can enjoy...
If your GPU supports bfloat16
, the script will automatically use it.
With only a few modifications, you can prepare and train on your own instruction dataset.
-
Create a json file in which each row holds one instruction-response pair. A row has an entry for 'instruction', 'input', and 'output', where 'input' is optional an can be the empty string if the instruction doesn't require a context. Below is an example json file:
[ { "instruction": "Arrange the given numbers in ascending order.", "input": "2, 4, 0, 8, 3", "output": "0, 2, 3, 4, 8" }, ... ]
-
Make a copy of
scripts/prepare_alpaca.py
and name it what you want:cp scripts/prepare_alpaca.py scripts/prepare_mydata.py
-
Modify
scripts/prepare_mydata.py
to read the json data file. -
Run the script to generate the preprocessed, tokenized train-val split:
python scripts/prepare_mydata.py --destination_path data/mydata/
-
Run
finetune_adapter.py
by passing in the location of your data (and optionally other parameters):python finetune_adapter.py \ --data_dir data/mydata/ \ --checkpoint_dir checkpoints/stabilityai/stablelm-base-alpha-3b \ --out_dir data/mydata-finetuned
If you run into a CUDA error "Expected is_sm80 to be true, but got false", uncomment the line
torch.backends.cuda.enable_flash_sdp(False)
in the finetune script (see Lightning-AI/lit-llama#101).