We provide a quick overview for training SimCLR self-supervised model on 1-gpu with VISSL.
For installation, please refer to INSTALL.md
.
We assume the downloaded data to look like:
imagenet_full_size
|_ train
| |_ <n0......>
| | |_<im-1-name>.JPEG
| | |_...
| | |_<im-N-name>.JPEG
| |_ ...
| |_ <n1......>
| | |_<im-1-name>.JPEG
| | |_...
| | |_<im-M-name>.JPEG
| | |_...
| | |_...
|_ val
| |_ <n0......>
| | |_<im-1-name>.JPEG
| | |_...
| | |_<im-N-name>.JPEG
| |_ ...
| |_ <n1......>
| | |_<im-1-name>.JPEG
| | |_...
| | |_<im-M-name>.JPEG
| | |_...
| | |_...
We provide a config to train model using the pretext SimCLR task on the ResNet50 model.
Change the DATA.TRAIN.DATA_PATHS
path to the ImageNet train dataset folder path.
cd $HOME/vissl
python3 tools/run_distributed_engines.py \
hydra.verbose=true \
config.DATA.TRAIN.DATASET_NAMES=[imagenet1k_folder] \
config.DATA.TRAIN.DATA_SOURCES=[disk_folder] \
config.DATA.TRAIN.DATA_PATHS=["/path/to/my/imagenet/folder/train"] \
config=test/integration_test/quick_simclr_imagefolder \
config.CHECKPOINT.DIR="./checkpoints" \
config.HOOKS.TENSORBOARD_SETUP.USE_TENSORBOARD=true
Users need to set the dataset and obtain the builtin tool for training. Follow the steps:
If you installed pre-built VISSL packages, we will set the ImageNet1K dataset following our data documentation and tutorial. NOTE that we need to register the dataset with VISSL.
In your python interpretor:
>>> json_data = {
"imagenet1k_folder": {
"train": ["<img_path>", "<lbl_path>"],
"val": ["<img_path>", "<lbl_path>"]
}
}
>>> from vissl.utils.io import save_file
>>> save_file(json_data, "/tmp/configs/config/dataset_catalog.json", append_to_json=False)
>>> from vissl.data.dataset_catalog import VisslDatasetCatalog
>>> print(VisslDatasetCatalog.list())
['imagenet1k_folder']
>>> print(VisslDatasetCatalog.get("imagenet1k_folder"))
{'train': ['<img_path>', '<lbl_path>'], 'val': ['<img_path>', '<lbl_path>']}
We will use the pre-built VISSL tool for training run_distributed_engines.py and the config file. Run
cd /tmp/ && mkdir -p /tmp/configs/config
wget -q -O configs/__init__.py https://dl.fbaipublicfiles.com/vissl/tutorials/configs/__init__.py
wget -q -O configs/config/quick_1gpu_resnet50_simclr.yaml https://dl.fbaipublicfiles.com/vissl/tutorials/configs/quick_1gpu_resnet50_simclr.yaml
wget -q https://dl.fbaipublicfiles.com/vissl/tutorials/run_distributed_engines.py
cd /tmp/
python3 run_distributed_engines.py \
hydra.verbose=true \
config.DATA.TRAIN.DATASET_NAMES=[imagenet1k_folder] \
config.DATA.TRAIN.DATA_SOURCES=[disk_folder] \
config.DATA.TRAIN.DATA_PATHS=["/path/to/my/imagenet/folder/train"] \
config=quick_1gpu_resnet50_simclr \
config.CHECKPOINT.DIR="./checkpoints" \
config.HOOKS.TENSORBOARD_SETUP.USE_TENSORBOARD=true
Explore all the parameters and settings VISSL supports in VISSL defaults.yaml file