Skip to content

Latest commit

 

History

History
228 lines (218 loc) · 6.14 KB

README.md

File metadata and controls

228 lines (218 loc) · 6.14 KB

Mobile-Former: Pytorch Implementation

This is a PyTorch implementation of the paper Mobile-Former: Bridging MobileNet and Transformer:

@Article{MobileFormer2021,
  author  = {Chen, Yinpeng and Dai, Xiyang and Chen, Dongdong and Liu, Mengchen and Dong, Xiaoyi and Yuan, Lu and Liu, Zicheng},
  journal = {arXiv:2108.05895},
  title   = {Mobile-Former: Bridging MobileNet and Transformer},
  year    = {2021},
}
model Input Param FLOPs Top-1 Pretrained
mobile-former-508m 224 14.0M 508M 79.3 download
mobile-former-294m 224 11.4M 294M 77.9 download
mobile-former-214m 224 9.4M 214M 76.7 download
mobile-former-151m 224 7.6M 151M 75.2 download
mobile-former-96m 224 4.6M 96M 72.8 download
mobile-former-52m 224 3.5M 52M 68.7 download
mobile-former-26m 224 3.2M 26M 64.0 download

Mobile-Former ImageNet Training

To train mobile-former-508m, run the following on 1 node with 8 GPUs:

python3 -m torch.distributed.launch --nproc_per_node=8 train.py $DATA_PATH \
    --output $OUTPUT_PATH1 \
    --model mobile-former-508m \
    -j 8 \
    --batch-size 128 \
    --epochs 450 \
    --opt adamw \
    --sched cosine \
    --lr 0.001 \
    --weight-decay 0.20 \
    --drop 0.3 \
    --drop-path 0.0 \
    --mixup 0.2 \
    --aa rand-m9-mstd0.5 \
    --remode pixel \
    --reprob 0.2 \
    --color-jitter 0. \
    --log-interval 200 \

mobile-former-294m

python3 -m torch.distributed.launch --nproc_per_node=8 train.py $DATA_PATH \
    --output $OUTPUT_PATH1 \
    --model mobile-former-294m \
    -j 8 \
    --batch-size 128 \
    --epochs 450 \
    --opt adamw \
    --sched cosine \
    --lr 0.001 \
    --weight-decay 0.20 \
    --drop 0.3 \
    --drop-path 0.0 \
    --mixup 0.2 \
    --aa rand-m9-mstd0.5 \
    --remode pixel \
    --reprob 0.2 \
    --color-jitter 0. \
    --log-interval 200 \

mobile-former-214m

python3 -m torch.distributed.launch --nproc_per_node=8 train.py $DATA_PATH \
    --output $OUTPUT_PATH1 \
    --model mobile-former-214m \
    -j 8 \
    --batch-size 128 \
    --epochs 450 \
    --opt adamw \
    --sched cosine \
    --lr 0.0009 \
    --weight-decay 0.15 \
    --drop 0.2 \
    --drop-path 0.0 \
    --mixup 0.2 \
    --aa rand-m9-mstd0.5 \
    --remode pixel \
    --reprob 0.2 \
    --color-jitter 0. \
    --log-interval 200 \

mobile-former-151m

python3 -m torch.distributed.launch --nproc_per_node=8 train.py $DATA_PATH \
    --output $OUTPUT_PATH1 \
    --model mobile-former-151m \
    -j 8 \
    --batch-size 128 \
    --epochs 450 \
    --opt adamw \
    --sched cosine \
    --lr 0.0009 \
    --weight-decay 0.10 \
    --drop 0.2 \
    --drop-path 0.0 \
    --mixup 0.2 \
    --aa rand-m9-mstd0.5 \
    --remode pixel \
    --reprob 0.2 \
    --color-jitter 0. \
    --log-interval 200 \

mobile-former-96m

python3 -m torch.distributed.launch --nproc_per_node=8 train.py $DATA_PATH \
    --output $OUTPUT_PATH1 \
    --model mobile-former-96m \
    -j 8 \
    --batch-size 128 \
    --epochs 450 \
    --opt adamw \
    --sched cosine \
    --lr 0.0008 \
    --weight-decay 0.10 \
    --drop 0.2 \
    --drop-path 0.0 \
    --mixup 0.0 \
    --aa rand-m9-mstd0.5 \
    --remode pixel \
    --reprob 0.0 \
    --color-jitter 0. \
    --log-interval 200 \

mobile-former-52m

python3 -m torch.distributed.launch --nproc_per_node=8 train.py $DATA_PATH \
    --output $OUTPUT_PATH1 \
    --model mobile-former-52m \
    -j 8 \
    --batch-size 128 \
    --epochs 450 \
    --opt adamw \
    --sched cosine \
    --lr 0.0008 \
    --weight-decay 0.10 \
    --drop 0.2 \
    --drop-path 0.0 \
    --mixup 0.2 \
    --remode pixel \
    --reprob 0.0 \
    --color-jitter 0. \
    --log-interval 200 \

mobile-former-26m

python3 -m torch.distributed.launch --nproc_per_node=8 train.py $DATA_PATH \
    --output $OUTPUT_PATH1 \
    --model mobile-former-26m \
    -j 8 \
    --batch-size 128 \
    --epochs 450 \
    --opt adamw \
    --sched cosine \
    --lr 0.0008 \
    --weight-decay 0.08 \
    --drop 0.1 \
    --drop-path 0.0 \
    --mixup 0.2 \
    --aa rand-m9-mstd0.5 \
    --remode pixel \
    --reprob 0.0 \
    --color-jitter 0. \
    --log-interval 200 \