Model ZOO for Semi-Supervised Learning on ImageNet-S

Finetuning with ViT

Method	Arch	Pretraining epochs	Pretraining mode	val	test	Pretrained	Finetuned
MAE	ViT-B/16	1600	SSL	38.3	37.0	model	model
MAE	ViT-B/16	1600	SSL+Sup	61.0	60.2	model	model
SERE	ViT-S/16	100	SSL	41.0	40.2	model	model
SERE	ViT-S/16	100	SSL+Sup	58.9	57.8	model	model

Masked Autoencoders Are Scalable Vision Learners (MAE)

Command for SSL+Sup

python -m torch.distributed.launch --nproc_per_node=8 main_segfinetune.py \
--accum_iter 1 \
--batch_size 32 \
--model vit_base_patch16 \
--finetune mae_finetuned_vit_base.pth \
--epochs 100 \
--nb_classes 920 \
--blr 1e-4 --layer_decay 0.40 \
--weight_decay 0.05 --drop_path 0.1  \
--data_path ${IMAGENETS_DIR} \
--output_dir ${OUTPATH} \
--dist_eval

Command for SSL

python -m torch.distributed.launch --nproc_per_node=8 main_segfinetune.py \
--accum_iter 1 \
--batch_size 32 \
--model vit_base_patch16 \
--finetune mae_pretrain_vit_base.pth \
--epochs 100 \
--nb_classes 920 \
--blr 5e-4 --layer_decay 0.60 \
--weight_decay 0.05 --drop_path 0.1  \
--data_path ${IMAGENETS_DIR} \
--output_dir ${OUTPATH} \
--dist_eval

SERE: Exploring Feature Self-relation for Self-supervised Transformer

Command for SSL+Sup

python -m torch.distributed.launch --nproc_per_node=8 main_segfinetune.py \
--accum_iter 1 \
--batch_size 32 \
--model vit_small_patch16 \
--finetune sere_finetuned_vit_small_ep100.pth \
--epochs 100 \
--nb_classes 920 \
--blr 5e-4 --layer_decay 0.50 \
--weight_decay 0.05 --drop_path 0.1  \
--data_path ${IMAGENETS_DIR} \
--output_dir ${OUTPATH} \
--dist_eval

Command for SSL

python -m torch.distributed.launch --nproc_per_node=8 main_segfinetune.py \
--accum_iter 1 \
--batch_size 32 \
--model vit_small_patch16 \
--finetune sere_pretrained_vit_small_ep100.pth \
--epochs 100 \
--nb_classes 920 \
--blr 5e-4 --layer_decay 0.50 \
--weight_decay 0.05 --drop_path 0.1  \
--data_path ${IMAGENETS_DIR} \
--output_dir ${OUTPATH} \
--dist_eval

Finetuning with ResNet

Method	Arch	Pretraining epochs	Pretraining mode	val	test	Pretrained	Finetuned
PASS	ResNet-50 D32	100	SSL	21.0	20.3	model	model
PASS	ResNet-50 D16	100	SSL	21.6	20.8	model	model

D16 means the output stride is 16 with dilation=2 in the last stage. This result is better than the results reported in the paper thanks to the new training scripts.

Large-scale Unsupervised Semantic Segmentation (PASS)

Command for SSL (ResNet-50 D32)

python -m torch.distributed.launch --nproc_per_node=8 main_segfinetune.py \
--accum_iter 1 \
--batch_size 32 \
--model resnet50 \
--finetune pass919_pretrained.pth.tar \
--epochs 100 \
--nb_classes 920 \
--blr 5e-4 --layer_decay 0.4 \
--weight_decay 0.0005 \
--data_path ${IMAGENETS_DIR} \
--output_dir ${OUTPATH} \
--dist_eval

Command for SSL (ResNet-50 D16)

python -m torch.distributed.launch --nproc_per_node=8 main_segfinetune.py \
--accum_iter 1 \
--batch_size 32 \
--model resnet50_d16 \
--finetune pass919_pretrained.pth.tar \
--epochs 100 \
--nb_classes 920 \
--blr 5e-4 --layer_decay 0.45 \
--weight_decay 0.0005 \
--data_path ${IMAGENETS_DIR} \
--output_dir ${OUTPATH} \
--dist_eval

Finetuning with RF-ConvNeXt

Arch	Pretraining epochs	RF-Next mode	val	test	Pretrained	Searched	Finetuned
ConvNeXt-T	300	-	48.7	48.8	model	-	model
RF-ConvNeXt-T	300	rfsingle	50.7	50.5	model	model	model
RF-ConvNeXt-T	300	rfmultiple	50.8	50.5	model	model	model
RF-ConvNeXt-T	300	rfmerge	51.3	51.1	model	model	model

Command for ConvNeXt-T

python -m torch.distributed.launch --nproc_per_node=8 main_segfinetune.py \
--accum_iter 1 \
--batch_size 32 \
--model convnext_tiny \
--patch_size 4 \
--finetune convnext_tiny_1k_224_ema.pth \
--epochs 100 \
--nb_classes 920 \
--blr 2.5e-4 --layer_decay 0.6 \
--weight_decay 0.05 --drop_path 0.2  \
--data_path ${IMAGENETS_DIR} \
--output_dir ${OUTPATH} \
--dist_eval

Before training RF-ConvNext, please search dilation rates with the mode of rfsearch.

For rfmultiple and rfsingle, please set pretrained_rfnext as the weights trained in rfsearch.

For rfmerge, we initilize the model with weights in rfmultiple and only finetune seg_norm, seg_head and rfconvs whose dilate rates are changed. The othe parts of the network are freezed. Please set pretrained_rfnext as the weights trained in rfmutilple.

Note that this freezing operation in rfmerge may be not required for other tasks.

Command for RF-ConvNeXt-T (rfsearch)

python -m torch.distributed.launch --nproc_per_node=8 main_segfinetune.py \
--accum_iter 1 \
--batch_size 32 \
--model rfconvnext_tiny_rfsearch \
--patch_size 4 \
--finetune convnext_tiny_1k_224_ema.pth \
--epochs 100 \
--nb_classes 920 \
--blr 2.5e-4 --layer_decay 0.6 0.9 --layer_multiplier 1.0 10.0 \
--weight_decay 0.05 --drop_path 0.2  \
--data_path ${IMAGENETS_DIR} \
--output_dir ${OUTPATH} \
--dist_eval

Command for RF-ConvNeXt-T (rfsingle)

python -m torch.distributed.launch --nproc_per_node=8 main_segfinetune.py \
--accum_iter 1 \
--batch_size 32 \
--model rfconvnext_tiny_rfsingle \
--patch_size 4 \
--finetune convnext_tiny_1k_224_ema.pth \
--pretrained_rfnext ${OUTPATH_OF_RFSEARCH}/checkpoint-99.pth \
--epochs 100 \
--nb_classes 920 \
--blr 2.5e-4 --layer_decay 0.6 0.9 --layer_multiplier 1.0 10.0 \
--weight_decay 0.05 --drop_path 0.2  \
--data_path ${IMAGENETS_DIR} \
--output_dir ${OUTPATH} \
--dist_eval

python inference.py --model rfconvnext_tiny_rfsingle \
--patch_size 4 \
--nb_classes 920 \
--output_dir ${OUTPATH}/predictions \
--data_path ${IMAGENETS_DIR} \
--pretrained_rfnext ${OUTPATH_OF_RFSEARCH}/checkpoint-99.pth \
--finetune ${OUTPATH}/checkpoint-99.pth \
--mode validation

Command for RF-ConvNeXt-T (rfmultiple)

python -m torch.distributed.launch --nproc_per_node=8 main_segfinetune.py \
--accum_iter 1 \
--batch_size 32 \
--model rfconvnext_tiny_rfmultiple \
--patch_size 4 \
--finetune convnext_tiny_1k_224_ema.pth \
--pretrained_rfnext ${OUTPATH_OF_RFSEARCH}/checkpoint-99.pth \
--epochs 100 \
--nb_classes 920 \
--blr 2.5e-4 --layer_decay 0.55 0.9 --layer_multiplier 1.0 10.0 \
--weight_decay 0.05 --drop_path 0.1  \
--data_path ${IMAGENETS_DIR} \
--output_dir ${OUTPATH} \
--dist_eval

python inference.py --model rfconvnext_tiny_rfmultiple \
--patch_size 4 \
--nb_classes 920 \
--output_dir ${OUTPATH}/predictions \
--data_path ${IMAGENETS_DIR} \
--pretrained_rfnext ${OUTPATH_OF_RFSEARCH}/checkpoint-99.pth \
--finetune ${OUTPATH}/checkpoint-99.pth \
--mode validation

Command for RF-ConvNeXt-T (rfmerge)

python -m torch.distributed.launch --nproc_per_node=8 main_segfinetune.py \
--accum_iter 1 \
--batch_size 32 \
--model rfconvnext_tiny_rfmerge \
--patch_size 4 \
--pretrained_rfnext ${OUTPATH_OF_RFMULTIPLE}/checkpoint-99.pth \
--epochs 100 \
--nb_classes 920 \
--blr 2.5e-4 --layer_decay 0.55 1.0 --layer_multiplier 1.0 10.0 \
--weight_decay 0.05 --drop_path 0.2  \
--data_path ${IMAGENETS_DIR} \
--output_dir ${OUTPATH} \
--dist_eval

python inference.py --model rfconvnext_tiny_rfmerge \
--patch_size 4 \
--nb_classes 920 \
--output_dir ${OUTPATH}/predictions \
--data_path ${IMAGENETS_DIR} \
--pretrained_rfnext ${OUTPATH_OF_RFMULTIPLE}/checkpoint-99.pth \
--finetune ${OUTPATH}/checkpoint-99.pth \
--mode validation

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MODEL_ZOO.md

MODEL_ZOO.md

Model ZOO for Semi-Supervised Learning on ImageNet-S

Finetuning with ViT

Masked Autoencoders Are Scalable Vision Learners (MAE)

SERE: Exploring Feature Self-relation for Self-supervised Transformer

Finetuning with ResNet

Large-scale Unsupervised Semantic Segmentation (PASS)

Finetuning with RF-ConvNeXt

Files

MODEL_ZOO.md

Latest commit

History

MODEL_ZOO.md

File metadata and controls

Model ZOO for Semi-Supervised Learning on ImageNet-S

Finetuning with ViT

Masked Autoencoders Are Scalable Vision Learners (MAE)

SERE: Exploring Feature Self-relation for Self-supervised Transformer

Finetuning with ResNet

Large-scale Unsupervised Semantic Segmentation (PASS)

Finetuning with RF-ConvNeXt