Skip to content

Latest commit






CLIP (ViT-B/32) based on the models provided by the openai-CLIP models, optimised for Graphcore's IPU.

Framework Domain Model Datasets Tasks Training Inference Reference
PyTorch Vision CLIP Conceptual Captions (cc3m), Imagenet LSVRC 2012, CIFAR-100 Image recognition

Min. 8 IPUs (POD16) required

Learning Transferable Visual Models From Natural Language Supervision

Instructions summary

  1. Install and enable the Poplar SDK (see Poplar SDK setup)

  2. Install the system and Python requirements (see Environment setup)

  3. Download the ImageNet LSVRC 2012 dataset (See Dataset setup)

Poplar SDK setup

To check if your Poplar SDK has already been enabled, run:


If no path is provided, then follow these steps:

  1. Navigate to your Poplar SDK root directory

  2. Enable the Poplar SDK with:

cd poplar-<OS version>-<SDK version>-<hash>
  1. Additionally, enable PopART with:
cd popart-<OS version>-<SDK version>-<hash>

More detailed instructions on setting up your Poplar environment are available in the Poplar quick start guide.

Environment setup

To prepare your environment, follow these steps:

  1. Create and activate a Python3 virtual environment:
python3 -m venv <venv name>
source <venv path>/bin/activate
  1. Navigate to the Poplar SDK root directory

  2. Install the PopTorch (PyTorch) wheel:

cd <poplar sdk root dir>
pip3 install poptorch...x86_64.whl
  1. Navigate to this example's root directory

  2. Install the Python requirements:

pip3 install -r requirements.txt

More detailed instructions on setting up your PyTorch environment are available in the PyTorch quick start guide.

Dataset setup

Conceptual Captions (cc3m)

Download the conceptual captions dataset in three steps with the scripts provided:

  1. Download Train_GCC-training.tsv from the Conceptual Captions source

  2. Use the provided script to download the main dataset:

mkdir data
mv Train_GCC-training.tsv data/
mkdir -p data/cc3m/images
python3 datasets/ --url_file data/Train_GCC-training.tsv --save_path data/cc3m
  1. Download the word segmentation vocabulary from the official CLIP repository and move it into the data directory:
mv bpe_simple_vocab_16e6.txt.gz datasets/

Disk space required: 84G

├── images
└── img_cap.csv

1 directory, 1 file

ImageNet LSVRC 2012 (Optional)

Download the ImageNet LSVRC 2012 dataset from the source or via kaggle

Disk space required: 144GB

├── bounding_boxes
├── imagenet_2012_bounding_boxes.csv
├── train
└── validation

3 directories, 1 file

And then pre-process the dataset using the scripts provided:

python3 datasets/

Running and benchmarking

To run a tested and optimised configuration and to reproduce the performance shown on our performance results page, use the examples_utils module (installed automatically as part of the environment setup) to run one or more benchmarks. The benchmarks are provided in the benchmarks.yml file in this example's root directory.

For example:

python3 -m examples_utils benchmark --spec <path to benchmarks.yml file>

Or to run a specific benchmark in the benchmarks.yml file provided:

python3 -m examples_utils benchmark --spec <path to benchmarks.yml file> --benchmark <name of benchmark>

For more information on using the examples-utils benchmarking module, please refer to the README.

Other features

Zero-shot evaluation

After training CLIP on cc3m dataset, you can apply zeroshot classification prediction on the validation set of ImageNet1k and CIFAR100 dataset to valify the performance of trained model. You can choose to use a checkpoint saved from the IPU by setting the is_ipu_ckpt to True or the official checkpoint by setting it to False. Zeroshot evaluation is performed on the validation set of ImageNet1k by default. If you want to perform zeroshot evaluation on CIFAR100, please set zeroshot_dataset to CIFAR100.

# Do zeroshot evaluation on ImageNet
python \
    --config CLIP_ViT-B-32_cc3m \
    --is_ipu_ckpt True \
    --zeroshot_dataset imagenet \
    --ckpt_file output/ckpt/

# Do zeroshot evaluation on CIFAR100
python \
    --config CLIP_ViT-B-32_cc3m \
    --is_ipu_ckpt True \
    --zeroshot_dataset cifar100 \
    --ckpt_file output/ckpt/


This application is licensed under MIT license. Please see the LICENSE file in this directory for full details of the license conditions.

The following files are created by Graphcore and are licensed under MIT License (* means additional license information stated following this list):

  • configs.yml
  • benchmarks.yml
  • requirements.txt
  • tests/
  • tests/
  • datasets/
  • datasets/

The following file include code from this repo which uses MIT license:

  • datasets/

The following file include code from this repo.

  • datasets/
  • datasets/

External packages:

  • wandb, pytest, pyyaml, transformers are licensed under MIT License
  • torchvision is licensed under BSD 3-Clause License