VISSL Model Zoo and Benchmarks

VISSL provides reference implementation of a large number of self-supervision approaches and also a suite of benchmark tasks to quickly evaluate the representation quality of models trained with these self-supervised tasks using standard evaluation setup. In this document, we list the collection of self-supervised models and benchmark of these models on a standard task of evaluating a linear classifier on ImageNet-1K. All the models can be downloaded from the provided links.

Torchvision and VISSL
- Converting VISSL to Torchvision
- Converting Torchvision to VISSL
Models
- Supervised
- Semi-weakly and Semi-supervised
- Jigsaw
- Colorization
- RotNet
- DeepCluster
- ClusterFit
- NPID
- NPID++
- PIRL
- SimCLR
- SimCLRv2
- BYOL
- DeepClusterV2
- SwAV
- SEER
- MoCoV2
- Barlow Twins
- DINO

Torchvision and VISSL

VISSL is 100% compatible with TorchVision ResNet models. It's easy to use torchvision models in VISSL and to use VISSL models in torchvision.

Converting VISSL to Torchvision

All the ResNe(X)t models in VISSL can be converted to Torchvision weights. This involves simply removing the _features_blocks. prefix from all the weights. VISSL provides a convenience script for this:

python extra_scripts/convert_vissl_to_torchvision.py \
    --model_url_or_file <input_model>.pth  \
    --output_dir /path/to/output/dir/ \
    --output_name <my_converted_model>.torch

Converting Torchvision to VISSL

All the ResNe(X)t models in Torchvision can be directly loaded in VISSL. This involves simply setting the REMOVE_PREFIX, APPEND_PREFIX options in the config file following the instructions here. Also, see the example below for how torchvision models are loaded.

Models

VISSL is 100% compatible with TorchVision ResNet models. You can benchmark these models using VISSL's benchmark suite. See the docs for how to run various benchmarks.

Supervised

To reproduce the numbers below, the experiment configuration is provided in json format for each model here.

Method	Model	PreTrain dataset	ImageNet top-1 acc.	URL
Supervised	RN50 - Torchvision	ImageNet	76.1	model
Supervised	RN101 - Torchvision	ImageNet	77.21	model
Supervised	RN50 - Caffe2	ImageNet	75.88	model
Supervised	RN50 - Caffe2	Places205	58.49	model
Supervised	Alexnet BVLC - Caffe2	ImageNet	49.54	model
Supervised	RN50 - VISSL - 105 epochs	ImageNet	75.45	model
Supervised	ViT/B16 - 90 epochs (*)	ImageNet-22K	83.38	model
Supervised	RegNetY-64Gf - BGR input	ImageNet	80.55	model
Supervised	RegNetY-128Gf - BGR input	ImageNet	80.57	model

(*) This specific checkpoint for ViT/B16 requires the following options to be added in command line to be loaded by VISSL: config.MODEL.WEIGHTS_INIT.APPEND_PREFIX=trunk.base_model. config.MODEL.WEIGHTS_INIT.STATE_DICT_KEY_NAME=classy_state_dict

Semi-weakly and Semi-supervised

To reproduce the numbers below, the experiment configuration is provided in json format for each model here.

Method	Model	PreTrain dataset	ImageNet top-1 acc.	URL
Semi-supervised	RN50	YFCC100M - ImageNet	79.2	model
Semi-weakly supervised	RN50	Public Instagram Images - ImageNet	81.06	model

Jigsaw

To reproduce the numbers below, the experiment configuration is provided in json format for each model here.

Method	Model	PreTrain dataset	ImageNet top-1 acc.	URL
Jigsaw	RN50 - 100 permutations	ImageNet-1K	48.57	model
Jigsaw	RN50 - 2K permutations	ImageNet-1K	46.73	model
Jigsaw	RN50 - 10K permutations	ImageNet-1K	48.11	model
Jigsaw	RN50 - 2K permutations	ImageNet-22K	44.84	model
Jigsaw	RN50 - Goyal'19	ImageNet-1K	46.58	model
Jigsaw	RN50 - Goyal'19	ImageNet-22K	53.09	model
Jigsaw	RN50 - Goyal'19	YFCC100M	51.37	model
Jigsaw	AlexNet - Goyal'19	ImageNet-1K	34.82	model
Jigsaw	AlexNet - Goyal'19	ImageNet-22K	37.5	model
Jigsaw	AlexNet - Goyal'19	YFCC100M	37.01	model

Colorization

To reproduce the numbers below, the experiment configuration is provided in json format for each model here.

Method	Model	PreTrain dataset	ImageNet top-1 acc.	URL
Colorization	RN50 - Goyal'19	ImageNet-1K	40.11	model
Colorization	RN50 - Goyal'19	ImageNet-22K	49.24	model
Colorization	RN50 - Goyal'19	YFCC100M	47.46	model
Colorization	AlexNet - Goyal'19	ImageNet-1K	30.39	model
Colorization	AlexNet - Goyal'19	ImageNet-22K	36.83	model
Colorization	AlexNet - Goyal'19	YFCC100M	34.19	model

RotNet

To reproduce the numbers below, the experiment configuration is provided in json format for each model here.

Method	Model	PreTrain dataset	ImageNet top-1 acc.	URL
RotNet	AlexNet official	ImageNet-1K	39.51	model
RotNet	RN50 - 105 epochs	ImageNet-1K	48.2	model
RotNet	RN50 - 105 epochs	ImageNet-22K	54.89	model

DeepCluster

To reproduce the numbers below, the experiment configuration is provided in json format for each model here.

Method	Model	PreTrain dataset	ImageNet top-1 acc.	URL
DeepCluster	AlexNet official	ImageNet-1K	37.88	model

ClusterFit

To reproduce the numbers below, the experiment configuration is provided in json format for each model here.

Method	Model	PreTrain dataset	ImageNet top-1 acc.	URL
ClusterFit	RN50 - 105 epochs - 16K clusters from RotNet	ImageNet-1K	53.63	model

NPID

To reproduce the numbers below, the experiment configuration is provided in json format for each model here.

Method	Model	PreTrain dataset	ImageNet top-1 acc.	URL
NPID	RN50 official oldies	ImageNet-1K	54.99	model
NPID	RN50 - 4k negatives - 200 epochs - VISSL	ImageNet-1K	52.73	model

NPID++

To reproduce the numbers below, the experiment configuration is provided in json format for each model here.

Method	Model	PreTrain dataset	ImageNet top-1 acc.	URL
NPID++	RN50 - 32k negatives - 800 epochs - cosine LR	ImageNet-1K	56.68	model
NPID++	RN50-w2 - 32k negatives - 800 epochs - cosine LR	ImageNet-1K	62.73	model

PIRL

To reproduce the numbers below, the experiment configuration is provided in json format for each model here.

Method	Model	PreTrain dataset	ImageNet top-1 acc.	URL
PIRL	RN50 - 200 epochs	ImageNet-1K	62.55	model
PIRL	RN50 - 800 epochs	ImageNet-1K	64.29	model

NOTE: Please see projects/PIRL/README.md for more PIRL models provided by authors.

SimCLR

To reproduce the numbers below, the experiment configuration is provided in json format for each model here.

Method	Model	PreTrain dataset	ImageNet top-1 acc.	URL
SimCLR	RN50 - 100 epochs	ImageNet-1K	64.4	model
SimCLR	RN50 - 200 epochs	ImageNet-1K	66.61	model
SimCLR	RN50 - 400 epochs	ImageNet-1K	67.71	model
SimCLR	RN50 - 800 epochs	ImageNet-1K	69.68	model
SimCLR	RN50 - 1000 epochs	ImageNet-1K	68.8	model
SimCLR	RN50-w2 - 100 epochs	ImageNet-1K	69.82	model
SimCLR	RN50-w2 - 1000 epochs	ImageNet-1K	73.84	model
SimCLR	RN50-w4 - 1000 epochs	ImageNet-1K	71.61	model
SimCLR	RN101 - 100 epochs	ImageNet-1K	62.76	model
SimCLR	RN101 - 1000 epochs	ImageNet-1K	71.56	model

SimCLRv2

The following models are converted from the TensorFlow format of the official repository to VISSL compatible format.

Method	Model	PreTrain dataset	ImageNet top-1 acc.	URL
SimCLRv2	RN152-w3-sk SimCLRv2 repository	ImageNet-1K	80.0	model

BYOL

The following models are converted from the TensorFlow format of the official repository to VISSL compatible format.

Method	Model	PreTrain dataset	ImageNet top-1 acc.	URL
BYOL	RN200-w2 BYOL repository (*)	ImageNet-1K	78.34	model

(*) This specific checkpoint requires the following command line options to be provided to VISSL to be correctly loaded by VISSL: config.MODEL.WEIGHTS_INIT.APPEND_PREFIX=trunk.base_model._feature_blocks. config.MODEL.WEIGHTS_INIT.STATE_DICT_KEY_NAME=''

DeepClusterV2

To reproduce the numbers below, the experiment configuration is provided in json format for each model here.

Method	Model	PreTrain dataset	ImageNet top-1 acc.	URL
DeepClusterV2	RN50 - 400 epochs - 2x224	ImageNet-1K	70.01	model
DeepClusterV2	RN50 - 400 epochs - 2x160+4x96	ImageNet-1K	74.32	model
DeepClusterV2	RN50 - 800 epochs - 2x224+6x96	ImageNet-1K	75.18	model

SwAV

To reproduce the numbers below, the experiment configuration is provided in json format for each model here.

There is some standard deviation in linear results if we run the same eval several times and pre-train a SwAV model several times. The evals reported below are for 1 run.

Method	Model	PreTrain dataset	ImageNet top-1 linear acc.	URL
SwAV	RN50 - 100 epochs - 2x224+6x96 - 4096 batch-size	ImageNet-1K	71.99	model
SwAV	RN50 - 200 epochs - 2x224+6x96 - 4096 batch-size	ImageNet-1K	73.85	model
SwAV	RN50 - 400 epochs - 2x224+6x96 - 4096 batch-size	ImageNet-1K	74.81	model
SwAV	RN50 - 800 epochs - 2x224+6x96 - 4096 batch-size	ImageNet-1K	74.92	model
SwAV	RN50 - 200 epochs - 2x224+6x96 - 256 batch-size	ImageNet-1K	73.07	model
SwAV	RN50 - 400 epochs - 2x224+6x96 - 256 batch-size	ImageNet-1K	74.3	model
SwAV	RN50 - 400 epochs - 2x224 - 4096 batch-size	ImageNet-1K	69.53	model
SwAV	RN50-w2 - 400 epochs - 2x224+6x96 - 4096 batch-size	ImageNet-1K	77.01	model
SwAV	RN50-w4 - 400 epochs - 2x224+6x96 - 2560 batch-size	ImageNet-1K	77.03	model
SwAV	RN50-w5 - 300 epochs - 2x224+6x96 - 2560 batch-size (*)	ImageNet-1K	78.5	model
SwAV	RegNetY-16Gf - 800 epochs - 2x224+6x96 - 4096 batch-size	ImageNet-1K	76.15	model
SwAV	RegNetY-128Gf - 400 epochs - 2x224+6x96 - 4096 batch-size	ImageNet-1K	78.36	model

NOTE: Please see projects/SwAV/README.md for more SwAV models provided by authors.

(*) This specific RN50-w5 checkpoint requires the following options to be added to be loaded by VISSL: config.MODEL.WEIGHTS_INIT.APPEND_PREFIX=trunk.base_model._feature_blocks. config.MODEL.WEIGHTS_INIT.STATE_DICT_KEY_NAME='' config.MODEL.WEIGHTS_INIT.REMOVE_PREFIX=module.

SEER

Method	Model	PreTrain dataset	ImageNet top-1 linear acc.	ImageNet top-1 fine-tuned acc.	URL
SEER	RegNetY-32Gf	IG-1B public images, non EU	74.03 (res5)	83.4	model
SEER	RegNetY-64Gf	IG-1B public images, non EU	75.25 (res5avg)	84.0	model
SEER	RegNetY-128Gf	IG-1B public images, non EU	75.96 (res5avg)	84.5	model
SEER	RegNetY-256Gf	IG-1B public images, non EU	77.51 (res5avg)	85.2	model
SEER	RegNet10B	IG-1B public images, non EU	79.8 (res4)	85.8	model

NOTE: Please see projects/SEER/README.md for more SwAV models provided by authors.

MoCoV2

Method	Model	PreTrain dataset	ImageNet top-1 acc.	URL
MoCo-v2	RN50 - 200 epochs - 256 batch-size	ImageNet-1K	66.4	model

MoCoV3

Method	Model	PreTrain dataset	ImageNet top-1 acc.	URL
MoCo-v3	ViT-B/16 - 300 epochs	ImageNet-1K	75.79	model

BarlowTwins

Method	Model	PreTrain dataset	ImageNet top-1 acc.	URL
Barlow Twins	RN50 - 300 epochs - 2048 batch-size	ImageNet-1K	70.75	model
Barlow Twins	RN50 - 1000 epochs - 2048 batch-size	ImageNet-1K	71.80	model

DINO

The ViT-small model is obtained with this config.

Method	Model	PreTrain dataset	ImageNet k-NN acc.	URL
DINO	ViT-S/16 - 300 epochs - 1024 batch-size	ImageNet-1K	73.4	model
DINO	XCiT-S/16 - 300 epochs - 1024 batch-size	ImageNet-1K	74.8	model

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MODEL_ZOO.md

MODEL_ZOO.md

VISSL Model Zoo and Benchmarks

Table of Contents

Torchvision and VISSL

Converting VISSL to Torchvision

Converting Torchvision to VISSL

Models

Supervised

Semi-weakly and Semi-supervised

Jigsaw

Colorization

RotNet

DeepCluster

ClusterFit

NPID

NPID++

PIRL

SimCLR

SimCLRv2

BYOL

DeepClusterV2

SwAV

SEER

MoCoV2

MoCoV3

BarlowTwins

DINO

Files

MODEL_ZOO.md

Latest commit

History

MODEL_ZOO.md

File metadata and controls

VISSL Model Zoo and Benchmarks

Table of Contents

Torchvision and VISSL

Converting VISSL to Torchvision

Converting Torchvision to VISSL

Models

Supervised

Semi-weakly and Semi-supervised

Jigsaw

Colorization

RotNet

DeepCluster

ClusterFit

NPID

NPID++

PIRL

SimCLR

SimCLRv2

BYOL

DeepClusterV2

SwAV

SEER

MoCoV2

MoCoV3

BarlowTwins

DINO