VISSL provides reference implementation of a large number of self-supervision approaches and also a suite of benchmark tasks to quickly evaluate the representation quality of models trained with these self-supervised tasks using standard evaluation setup. In this document, we list the collection of self-supervised models and benchmark of these models on a standard task of evaluating a linear classifier on ImageNet-1K. All the models can be downloaded from the provided links.
VISSL is 100% compatible with TorchVision ResNet models. It's easy to use torchvision models in VISSL and to use VISSL models in torchvision.
All the ResNe(X)t models in VISSL can be converted to Torchvision weights. This involves simply removing the _features_blocks.
prefix from all the weights. VISSL provides a convenience script for this:
python extra_scripts/convert_vissl_to_torchvision.py \
--model_url_or_file <input_model>.pth \
--output_dir /path/to/output/dir/ \
--output_name <my_converted_model>.torch
All the ResNe(X)t models in Torchvision can be directly loaded in VISSL. This involves simply setting the REMOVE_PREFIX
, APPEND_PREFIX
options in the config file following the instructions here. Also, see the example below for how torchvision models are loaded.
VISSL is 100% compatible with TorchVision ResNet models. You can benchmark these models using VISSL's benchmark suite. See the docs for how to run various benchmarks.
To reproduce the numbers below, the experiment configuration is provided in json format for each model here.
Method | Model | PreTrain dataset | ImageNet top-1 acc. | URL |
---|---|---|---|---|
Supervised | RN50 - Torchvision | ImageNet | 76.1 | model |
Supervised | RN101 - Torchvision | ImageNet | 77.21 | model |
Supervised | RN50 - Caffe2 | ImageNet | 75.88 | model |
Supervised | RN50 - Caffe2 | Places205 | 58.49 | model |
Supervised | Alexnet BVLC - Caffe2 | ImageNet | 49.54 | model |
Supervised | RN50 - VISSL - 105 epochs | ImageNet | 75.45 | model |
Supervised | ViT/B16 - 90 epochs (*) | ImageNet-22K | 83.38 | model |
Supervised | RegNetY-64Gf - BGR input | ImageNet | 80.55 | model |
Supervised | RegNetY-128Gf - BGR input | ImageNet | 80.57 | model |
(*) This specific checkpoint for ViT/B16 requires the following options to be added in command line to be loaded by VISSL: config.MODEL.WEIGHTS_INIT.APPEND_PREFIX=trunk.base_model. config.MODEL.WEIGHTS_INIT.STATE_DICT_KEY_NAME=classy_state_dict
To reproduce the numbers below, the experiment configuration is provided in json format for each model here.
Method | Model | PreTrain dataset | ImageNet top-1 acc. | URL |
---|---|---|---|---|
Semi-supervised | RN50 | YFCC100M - ImageNet | 79.2 | model |
Semi-weakly supervised | RN50 | Public Instagram Images - ImageNet | 81.06 | model |
To reproduce the numbers below, the experiment configuration is provided in json format for each model here.
Method | Model | PreTrain dataset | ImageNet top-1 acc. | URL |
---|---|---|---|---|
Jigsaw | RN50 - 100 permutations | ImageNet-1K | 48.57 | model |
Jigsaw | RN50 - 2K permutations | ImageNet-1K | 46.73 | model |
Jigsaw | RN50 - 10K permutations | ImageNet-1K | 48.11 | model |
Jigsaw | RN50 - 2K permutations | ImageNet-22K | 44.84 | model |
Jigsaw | RN50 - Goyal'19 | ImageNet-1K | 46.58 | model |
Jigsaw | RN50 - Goyal'19 | ImageNet-22K | 53.09 | model |
Jigsaw | RN50 - Goyal'19 | YFCC100M | 51.37 | model |
Jigsaw | AlexNet - Goyal'19 | ImageNet-1K | 34.82 | model |
Jigsaw | AlexNet - Goyal'19 | ImageNet-22K | 37.5 | model |
Jigsaw | AlexNet - Goyal'19 | YFCC100M | 37.01 | model |
To reproduce the numbers below, the experiment configuration is provided in json format for each model here.
Method | Model | PreTrain dataset | ImageNet top-1 acc. | URL |
---|---|---|---|---|
Colorization | RN50 - Goyal'19 | ImageNet-1K | 40.11 | model |
Colorization | RN50 - Goyal'19 | ImageNet-22K | 49.24 | model |
Colorization | RN50 - Goyal'19 | YFCC100M | 47.46 | model |
Colorization | AlexNet - Goyal'19 | ImageNet-1K | 30.39 | model |
Colorization | AlexNet - Goyal'19 | ImageNet-22K | 36.83 | model |
Colorization | AlexNet - Goyal'19 | YFCC100M | 34.19 | model |
To reproduce the numbers below, the experiment configuration is provided in json format for each model here.
Method | Model | PreTrain dataset | ImageNet top-1 acc. | URL |
---|---|---|---|---|
RotNet | AlexNet official | ImageNet-1K | 39.51 | model |
RotNet | RN50 - 105 epochs | ImageNet-1K | 48.2 | model |
RotNet | RN50 - 105 epochs | ImageNet-22K | 54.89 | model |
To reproduce the numbers below, the experiment configuration is provided in json format for each model here.
Method | Model | PreTrain dataset | ImageNet top-1 acc. | URL |
---|---|---|---|---|
DeepCluster | AlexNet official | ImageNet-1K | 37.88 | model |
To reproduce the numbers below, the experiment configuration is provided in json format for each model here.
Method | Model | PreTrain dataset | ImageNet top-1 acc. | URL |
---|---|---|---|---|
ClusterFit | RN50 - 105 epochs - 16K clusters from RotNet | ImageNet-1K | 53.63 | model |
To reproduce the numbers below, the experiment configuration is provided in json format for each model here.
Method | Model | PreTrain dataset | ImageNet top-1 acc. | URL |
---|---|---|---|---|
NPID | RN50 official oldies | ImageNet-1K | 54.99 | model |
NPID | RN50 - 4k negatives - 200 epochs - VISSL | ImageNet-1K | 52.73 | model |
To reproduce the numbers below, the experiment configuration is provided in json format for each model here.
Method | Model | PreTrain dataset | ImageNet top-1 acc. | URL |
---|---|---|---|---|
NPID++ | RN50 - 32k negatives - 800 epochs - cosine LR | ImageNet-1K | 56.68 | model |
NPID++ | RN50-w2 - 32k negatives - 800 epochs - cosine LR | ImageNet-1K | 62.73 | model |
To reproduce the numbers below, the experiment configuration is provided in json format for each model here.
Method | Model | PreTrain dataset | ImageNet top-1 acc. | URL |
---|---|---|---|---|
PIRL | RN50 - 200 epochs | ImageNet-1K | 62.55 | model |
PIRL | RN50 - 800 epochs | ImageNet-1K | 64.29 | model |
NOTE: Please see projects/PIRL/README.md for more PIRL models provided by authors.
To reproduce the numbers below, the experiment configuration is provided in json format for each model here.
Method | Model | PreTrain dataset | ImageNet top-1 acc. | URL |
---|---|---|---|---|
SimCLR | RN50 - 100 epochs | ImageNet-1K | 64.4 | model |
SimCLR | RN50 - 200 epochs | ImageNet-1K | 66.61 | model |
SimCLR | RN50 - 400 epochs | ImageNet-1K | 67.71 | model |
SimCLR | RN50 - 800 epochs | ImageNet-1K | 69.68 | model |
SimCLR | RN50 - 1000 epochs | ImageNet-1K | 68.8 | model |
SimCLR | RN50-w2 - 100 epochs | ImageNet-1K | 69.82 | model |
SimCLR | RN50-w2 - 1000 epochs | ImageNet-1K | 73.84 | model |
SimCLR | RN50-w4 - 1000 epochs | ImageNet-1K | 71.61 | model |
SimCLR | RN101 - 100 epochs | ImageNet-1K | 62.76 | model |
SimCLR | RN101 - 1000 epochs | ImageNet-1K | 71.56 | model |
The following models are converted from the TensorFlow format of the official repository to VISSL compatible format.
Method | Model | PreTrain dataset | ImageNet top-1 acc. | URL |
---|---|---|---|---|
SimCLRv2 | RN152-w3-sk SimCLRv2 repository | ImageNet-1K | 80.0 | model |
The following models are converted from the TensorFlow format of the official repository to VISSL compatible format.
Method | Model | PreTrain dataset | ImageNet top-1 acc. | URL |
---|---|---|---|---|
BYOL | RN200-w2 BYOL repository (*) | ImageNet-1K | 78.34 | model |
(*) This specific checkpoint requires the following command line options to be provided to VISSL to be correctly loaded by VISSL: config.MODEL.WEIGHTS_INIT.APPEND_PREFIX=trunk.base_model._feature_blocks. config.MODEL.WEIGHTS_INIT.STATE_DICT_KEY_NAME=''
To reproduce the numbers below, the experiment configuration is provided in json format for each model here.
Method | Model | PreTrain dataset | ImageNet top-1 acc. | URL |
---|---|---|---|---|
DeepClusterV2 | RN50 - 400 epochs - 2x224 | ImageNet-1K | 70.01 | model |
DeepClusterV2 | RN50 - 400 epochs - 2x160+4x96 | ImageNet-1K | 74.32 | model |
DeepClusterV2 | RN50 - 800 epochs - 2x224+6x96 | ImageNet-1K | 75.18 | model |
To reproduce the numbers below, the experiment configuration is provided in json format for each model here.
There is some standard deviation in linear results if we run the same eval several times and pre-train a SwAV model several times. The evals reported below are for 1 run.
Method | Model | PreTrain dataset | ImageNet top-1 linear acc. | URL |
---|---|---|---|---|
SwAV | RN50 - 100 epochs - 2x224+6x96 - 4096 batch-size | ImageNet-1K | 71.99 | model |
SwAV | RN50 - 200 epochs - 2x224+6x96 - 4096 batch-size | ImageNet-1K | 73.85 | model |
SwAV | RN50 - 400 epochs - 2x224+6x96 - 4096 batch-size | ImageNet-1K | 74.81 | model |
SwAV | RN50 - 800 epochs - 2x224+6x96 - 4096 batch-size | ImageNet-1K | 74.92 | model |
SwAV | RN50 - 200 epochs - 2x224+6x96 - 256 batch-size | ImageNet-1K | 73.07 | model |
SwAV | RN50 - 400 epochs - 2x224+6x96 - 256 batch-size | ImageNet-1K | 74.3 | model |
SwAV | RN50 - 400 epochs - 2x224 - 4096 batch-size | ImageNet-1K | 69.53 | model |
SwAV | RN50-w2 - 400 epochs - 2x224+6x96 - 4096 batch-size | ImageNet-1K | 77.01 | model |
SwAV | RN50-w4 - 400 epochs - 2x224+6x96 - 2560 batch-size | ImageNet-1K | 77.03 | model |
SwAV | RN50-w5 - 300 epochs - 2x224+6x96 - 2560 batch-size (*) | ImageNet-1K | 78.5 | model |
SwAV | RegNetY-16Gf - 800 epochs - 2x224+6x96 - 4096 batch-size | ImageNet-1K | 76.15 | model |
SwAV | RegNetY-128Gf - 400 epochs - 2x224+6x96 - 4096 batch-size | ImageNet-1K | 78.36 | model |
NOTE: Please see projects/SwAV/README.md for more SwAV models provided by authors.
(*) This specific RN50-w5 checkpoint requires the following options to be added to be loaded by VISSL: config.MODEL.WEIGHTS_INIT.APPEND_PREFIX=trunk.base_model._feature_blocks. config.MODEL.WEIGHTS_INIT.STATE_DICT_KEY_NAME='' config.MODEL.WEIGHTS_INIT.REMOVE_PREFIX=module.
Method | Model | PreTrain dataset | ImageNet top-1 linear acc. | ImageNet top-1 fine-tuned acc. | URL |
---|---|---|---|---|---|
SEER | RegNetY-32Gf | IG-1B public images, non EU | 74.03 (res5) | 83.4 | model |
SEER | RegNetY-64Gf | IG-1B public images, non EU | 75.25 (res5avg) | 84.0 | model |
SEER | RegNetY-128Gf | IG-1B public images, non EU | 75.96 (res5avg) | 84.5 | model |
SEER | RegNetY-256Gf | IG-1B public images, non EU | 77.51 (res5avg) | 85.2 | model |
SEER | RegNet10B | IG-1B public images, non EU | 79.8 (res4) | 85.8 | model |
NOTE: Please see projects/SEER/README.md for more SwAV models provided by authors.
Method | Model | PreTrain dataset | ImageNet top-1 acc. | URL |
---|---|---|---|---|
MoCo-v2 | RN50 - 200 epochs - 256 batch-size | ImageNet-1K | 66.4 | model |
Method | Model | PreTrain dataset | ImageNet top-1 acc. | URL |
---|---|---|---|---|
MoCo-v3 | ViT-B/16 - 300 epochs | ImageNet-1K | 75.79 | model |
Method | Model | PreTrain dataset | ImageNet top-1 acc. | URL |
---|---|---|---|---|
Barlow Twins | RN50 - 300 epochs - 2048 batch-size | ImageNet-1K | 70.75 | model |
Barlow Twins | RN50 - 1000 epochs - 2048 batch-size | ImageNet-1K | 71.80 | model |
The ViT-small model is obtained with this config.
Method | Model | PreTrain dataset | ImageNet k-NN acc. | URL |
---|---|---|---|---|
DINO | ViT-S/16 - 300 epochs - 1024 batch-size | ImageNet-1K | 73.4 | model |
DINO | XCiT-S/16 - 300 epochs - 1024 batch-size | ImageNet-1K | 74.8 | model |