The official implementation of Low-Rank Few-Shot Adaptation of Vision-Language Models.
Authors: Maxime Zanella, Ismail Ben Ayed.
We present CLIP-LoRA, an easy-to-use few-shot method for Vision-Language Models with fixed hyperparameters for every task and every number of shots. This repository also aims at facilitating the usage of Low-Rank adapters (LoRA) in Vision-Language Models like CLIP.
Figure 1: Low-Rank Adaptation (LoRA) is easy to use and does not create any additional inference latency.
Here is how to run the experiments:
A quick guide on how LoRA is implemented in this repository:
Please consider supporting our work:
If you have any inquiries:
Our code requires an environment with PyTorch installed. If you don't have one, consider creating a Python environment with:
conda create -y --name CLIP-LoRA python=3.10.0
conda activate CLIP-LoRA
And install Pytorch for instance with:
pip3 install torch==2.0.1 torchaudio==2.0.2 torchvision==0.15.2
Please follow DATASETS.md to install the datasets.
Execute CLIP-LoRA on the ImageNet dataset with a random seed of 1 by entering the following command:
python main.py --root_path /path/to/your/data --dataset imagenet --seed 1
You can also exectute CLIP-LoRA on the 10 other datasets:
python main.py --root_path /path/to/your/data --dataset dataset_name --seed 1
You can optionally provide a save_path to save the LoRA modules, which can be reload easily with the --eval_only argument. The code will automatically check if your trained LoRA with the corresponding rank, alpha, encoder, params and position to ensure compatibility. The folder will be structured like that:
/your/save/path
└── backbone
└── dataset
└── Xshots
├── seedY
Here is the command line:
python main.py --root_path /path/to/your/data --dataset dataset_name --seed 1 --save_path /your/save/path --eval_only
The PlainMultiheadAttentionLoRA
class in loralib/layers.py
extends the standard PyTorch multi-head attention mechanism by incorporating Low-Rank Adaptation (LoRA). This class constructs explicit linear modules for each component of the attention mechanism—query (q
), key (k
), value (v
), and output (o
)—providing a structured and adaptable foundation for your experiments.
PlainMultiheadAttentionLoRA
takes an existing nn.MultiheadAttention
module, replicates its configuration, and integrates LoRA linear modules.
- Parameter Initialization: The initialization process involves copying weights and biases from a pre-existing multi-head attention model. Each LoRA module (
q
,k
,v
,o
) is adapted based on the specified requirements in theenable_lora
list. - LoRA Integration: The replacement of standard linear layers with
LinearLoRA
layers introduces low-rank matrices, which are parameterized by the rank of adaptation (r
) and the scaling factor (lora_alpha
). - Forward Pass: The
forward_module
method manages the attention computation, incorporating optional dropout settings on the LoRA modules.
The following snippet demonstrates how to initialize the PlainMultiheadAttentionLoRA
with an existing multi-head attention module.
from loralib.layers import PlainMultiheadAttentionLoRA
# Initialize with an existing MultiheadAttention module
existing_mha = nn.MultiheadAttention(embed_dim=512, num_heads=8)
lora_mha = PlainMultiheadAttentionLoRA(existing_mha, enable_lora=['q', 'k', 'v', 'o'], r=4, lora_alpha=2)
Figure 2: Detailed few-shot learning results on the 10 fine-grained datasets and ImageNet with the ViT-B/16 visual backbone. Average performance for the ViT-B/16, ViT-B/32 and ViT-L/14 on the same 11 datasets is reported in the last three plots.
If you find this project useful, please cite it as follows:
@inproceedings{zanella2024low,
title={Low-Rank Few-Shot Adaptation of Vision-Language Models},
author={Zanella, Maxime and Ben Ayed, Ismail},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops},
pages={1593--1603},
year={2024}
}
For any inquiries, feel free to create an issue or contact us at [email protected].
We express our gratitude to the CoOp and Tip-Adapter authors for their open-source contribution.