CLIP prefix captioning.

implementation for the paper "ClipCap: CLIP Prefix for Image Captioning"

Description

ClipCap: CLIP Prefix for Image Captioning
original ClipCap github : CLIP_prefix_caption

code references

Training prerequisites

Clone, create environment and install dependencies:

conda env create -f environment.yml
conda activate clip_prefix_caption
pip install -e "git+https://github.com/replicate/[email protected]#egg=cog&subdirectory=python/"
pip install transformers --upgrade

COCO training

Download train_captions to data/coco/annotations.

Download training images and validation images and unzip (We use Karpathy et el. split).

Extract CLIP features using (output is data/coco/oscar_split_ViT-B_32_train.pkl):

python parse_coco.py --clip_model_type ViT-B/32

Train with fine-tuning of GPT2:

python train.py --data ./data/coco/oscar_split_ViT-B_32_train.pkl --out_dir ./coco_train/

In case you want to train model with OPT, please look directly "Swith your language model from GPT-2 to OPT"
Train only transformer mapping network:

python train.py --only_prefix --data ./data/coco/oscar_split_ViT-B_32_train.pkl --out_dir ./coco_train/ --mapping_type transformer  --num_layres 8 --prefix_length 40 --prefix_length_clip 40

Swith your language model from GPT-2 to OPT

We enabled to train your ClipCap model with OPT. We are looking forward to make this code work well with BLIP model. Training code is available at train_OPT.py and inference code will be updated on predict_OPT.py, which is basically running Predictor function in predict.py. Please note that you manullay have to make sure your desired language model is 'facebook/opt-125m' (variable named as OPT_MODEL) on both predict.py and train.py.

python train_OPT.py --data ./data/coco/oscar_split_ViT-B_32_train.pkl --out_dir /data/daisy/clipcap_output/coco_train/ --only_prefix --device

python predict_nice.py

model parallelization

OPT-1.3b : 2-GPU, 16GB (per GPU), 1h13m per epoch
OPT-2.7b : 3-GPU, 18GB (per GPU), 11h per epoch

latest update : 2023-04-04

Citation

If you use this code for your research, please cite:

@article{mokady2021clipcap,
  title={ClipCap: CLIP Prefix for Image Captioning},
  author={Mokady, Ron and Hertz, Amir and Bermano, Amit H},
  journal={arXiv preprint arXiv:2111.09734},
  year={2021}
}

Acknowledgments

This repository is heavily based on CLIP and Hugging-faces repositories. For training we used the data of COCO dataset and Conceptual Captions.

Contact

For any inquiry please contact us at our email addresses: [email protected] or [email protected].

Name		Name	Last commit message	Last commit date
Latest commit History 72 Commits
.idea		.idea
Images		Images
data		data
for_inference		for_inference
.DS_Store		.DS_Store
LICENSE		LICENSE
README.md		README.md
cog.yaml		cog.yaml
environment.yml		environment.yml
evaluate.py		evaluate.py
inference.ipynb		inference.ipynb
inference.py		inference.py
modeling_opt_pp.py		modeling_opt_pp.py
parse_coco.py		parse_coco.py
predict.py		predict.py
predict_nice.py		predict_nice.py
train.py		train.py
train_OPT.py		train_OPT.py
train_models.py		train_models.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CLIP prefix captioning.

implementation for the paper "ClipCap: CLIP Prefix for Image Captioning"

Description

Training prerequisites

COCO training

Swith your language model from GPT-2 to OPT

model parallelization

Citation

Acknowledgments

Contact

About

Releases

Packages

Languages

License

Jhryu30/cvpr2023_challenge_clipcap

Folders and files

Latest commit

History

Repository files navigation

CLIP prefix captioning.

implementation for the paper "ClipCap: CLIP Prefix for Image Captioning"

Description

Training prerequisites

COCO training

Swith your language model from GPT-2 to OPT

model parallelization

Citation

Acknowledgments

Contact

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages