CLIP (ViT-B/32) based on the models provided by the openai-CLIP models, optimised for Graphcore's IPU.
Framework | Domain | Model | Datasets | Tasks | Training | Inference | Reference |
---|---|---|---|---|---|---|---|
PyTorch | Vision | CLIP | Conceptual Captions (cc3m), Imagenet LSVRC 2012, CIFAR-100 | Image recognition | ✅ |
❌ |
Learning Transferable Visual Models From Natural Language Supervision |
-
Install and enable the Poplar SDK (see Poplar SDK setup)
-
Install the system and Python requirements (see Environment setup)
-
Download the ImageNet LSVRC 2012 dataset (See Dataset setup)
To check if your Poplar SDK has already been enabled, run:
echo $POPLAR_SDK_ENABLED
If no path is provided, then follow these steps:
-
Navigate to your Poplar SDK root directory
-
Enable the Poplar SDK with:
cd poplar-<OS version>-<SDK version>-<hash>
. enable.sh
- Additionally, enable PopART with:
cd popart-<OS version>-<SDK version>-<hash>
. enable.sh
More detailed instructions on setting up your Poplar environment are available in the Poplar quick start guide.
To prepare your environment, follow these steps:
- Create and activate a Python3 virtual environment:
python3 -m venv <venv name>
source <venv path>/bin/activate
-
Navigate to the Poplar SDK root directory
-
Install the PopTorch (PyTorch) wheel:
cd <poplar sdk root dir>
pip3 install poptorch...x86_64.whl
-
Navigate to this example's root directory
-
Install the Python requirements:
pip3 install -r requirements.txt
More detailed instructions on setting up your PyTorch environment are available in the PyTorch quick start guide.
Download the conceptual captions dataset in three steps with the scripts provided:
-
Download
Train_GCC-training.tsv
from the Conceptual Captions source -
Use the provided script to download the main dataset:
mkdir data
mv Train_GCC-training.tsv data/
mkdir -p data/cc3m/images
python3 datasets/download.py --url_file data/Train_GCC-training.tsv --save_path data/cc3m
- Download the word segmentation vocabulary from the official CLIP repository and move it into the data directory:
mv bpe_simple_vocab_16e6.txt.gz datasets/
Disk space required: 84G
.
├── images
└── img_cap.csv
1 directory, 1 file
Download the ImageNet LSVRC 2012 dataset from the source or via kaggle
Disk space required: 144GB
.
├── bounding_boxes
├── imagenet_2012_bounding_boxes.csv
├── train
└── validation
3 directories, 1 file
And then pre-process the dataset using the scripts provided:
python3 datasets/preprocess.py
To run a tested and optimised configuration and to reproduce the performance shown on our performance results page, use the examples_utils
module (installed automatically as part of the environment setup) to run one or more benchmarks. The benchmarks are provided in the benchmarks.yml
file in this example's root directory.
For example:
python3 -m examples_utils benchmark --spec <path to benchmarks.yml file>
Or to run a specific benchmark in the benchmarks.yml
file provided:
python3 -m examples_utils benchmark --spec <path to benchmarks.yml file> --benchmark <name of benchmark>
For more information on using the examples-utils benchmarking module, please refer to the README.
After training CLIP on cc3m dataset, you can apply zeroshot classification prediction on the validation set of ImageNet1k and CIFAR100 dataset to valify the performance of trained model. You can choose to use a checkpoint saved from the IPU by setting the is_ipu_ckpt
to True
or the official checkpoint by setting it to False
. Zeroshot evaluation is performed on the validation set of ImageNet1k by default. If you want to perform zeroshot evaluation on CIFAR100, please set zeroshot_dataset
to CIFAR100.
# Do zeroshot evaluation on ImageNet
python zero_shot.py \
--config CLIP_ViT-B-32_cc3m \
--is_ipu_ckpt True \
--zeroshot_dataset imagenet \
--ckpt_file output/ckpt/CLIP_epoch_K.pt
# Do zeroshot evaluation on CIFAR100
python zero_shot.py \
--config CLIP_ViT-B-32_cc3m \
--is_ipu_ckpt True \
--zeroshot_dataset cifar100 \
--ckpt_file output/ckpt/CLIP_epoch_K.pt
This application is licensed under MIT license. Please see the LICENSE file in this directory for full details of the license conditions.
The following files are created by Graphcore and are licensed under MIT License (* means additional license information stated following this list):
- log.py
- args.py
- train.py
- README.md
- configs.yml
- benchmarks.yml
- preprocess.py
- checkpoint.py
- ipu_options.py
- optimization.py
- requirements.txt
- tests/import_helper.py
- tests/cpu_ipu_test.py
- datasets/preprocess.py
- datasets/text_templates.pt
The following file include code from this repo
which uses MIT license:
- model.py
- datasets/simple_tokenizer.py
The following file include code from this repo
.
- zers_shot.py
- datasets/dataset.py
- datasets/download.py
External packages:
wandb
,pytest
,pyyaml
,transformers
are licensed under MIT Licensetorchvision
is licensed under BSD 3-Clause License