This repo is a PyTorch implementation of applying MogaNet to object detaction and instance segmentation with Mask R-CNN and RetinaNet on COCO. The code is based on MMDetection. For more details, see Efficient Multi-order Gated Aggregation Network (ICLR 2024).
Please note that we simply follow the hyper-parameters of PVT and ConvNeXt, which may not be the optimal ones for MogaNet. Feel free to tune the hyper-parameters to get better performance.
Install MMDetection from souce code, or follow the following steps. This experiment uses MMDetection>=2.19.0, and we reproduced the results with MMDetection v2.26.0 and Pytorch==1.10.
pip install openmim
mim install mmcv-full
pip install mmdet
Apex (optional) for Pytorch<=1.6.0:
git clone https://github.com/NVIDIA/apex
cd apex
python setup.py install --cpp_ext --cuda_ext --user
By default, we run experiments with fp32 or fp16 (Apex). If you would like to disable apex, modify the type of runner as EpochBasedRunner
and comment out the following code block in the configuration files:
fp16 = None
optimizer_config = dict(
type="DistOptimizerHook",
update_interval=1,
grad_clip=None,
coalesce=True,
bucket_size_mb=-1,
use_fp16=True,
)
Note: Since we write MogaNet backbone code of detection, segmentation, and pose estimation in the same file, it also works for MMSegmentation and MMPose through @BACKBONES.register_module()
. Please continue to install MMSegmentation or MMPose for further usage.
Download COCO2017 and prepare COCO experiments according to the guidelines in MMDetection.
Notes: All the models can also be downloaded by Baidu Cloud (z8mf) at MogaNet/COCO_Detection
. We preform object detection experiments based on RetinaNet for 1x training setting, while performing detection and instance segmentation experiments based on Mask R-CNN and Cascade Mask R-CNN for 1x or MS 3x (multiple scales) training settings. The params (M) and FLOPs (G) are measured by get_flops with 1280
python get_flops.py /path/to/config --shape 1280 800
Method | Backbone | Pretrain | Params | FLOPs | Lr schd | box mAP | Config | Download |
---|---|---|---|---|---|---|---|---|
RetinaNet | MogaNet-XT | ImageNet-1K | 12.1M | 167.2G | 1x | 39.7 | config | log / model |
RetinaNet | MogaNet-T | ImageNet-1K | 14.4M | 173.4G | 1x | 41.4 | config | log / model |
RetinaNet | MogaNet-S | ImageNet-1K | 35.1M | 253.0G | 1x | 45.8 | config | log / model |
RetinaNet | MogaNet-B | ImageNet-1K | 53.5M | 354.5G | 1x | 47.7 | config | log / model |
RetinaNet | MogaNet-L | ImageNet-1K | 92.4M | 476.8G | 1x | 48.7 | config | log / model |
Method | Backbone | Pretrain | Params | FLOPs | Lr schd | box mAP | mask mAP | Config | Download |
---|---|---|---|---|---|---|---|---|---|
Mask R-CNN | MogaNet-XT | ImageNet-1K | 22.8M | 185.4G | 1x | 40.7 | 37.6 | config | log / model |
Mask R-CNN | MogaNet-T | ImageNet-1K | 25.0M | 191.7G | 1x | 42.6 | 39.1 | config | log / model |
Mask R-CNN | MogaNet-S | ImageNet-1K | 45.0M | 271.6G | 1x | 46.6 | 42.2 | config | log / model |
Mask R-CNN | MogaNet-B | ImageNet-1K | 63.4M | 373.1G | 1x | 49.0 | 43.8 | config | log / model |
Mask R-CNN | MogaNet-L | ImageNet-1K | 102.1M | 495.3G | 1x | 49.4 | 44.2 | config | log / model |
Mask R-CNN | MogaNet-T | ImageNet-1K | 25.0M | 191.7G | MS 3x | 45.3 | 40.7 | config | log / model |
Mask R-CNN | MogaNet-S | ImageNet-1K | 45.0M | 271.6G | MS 3x | 48.5 | 43.1 | config | log / model |
Mask R-CNN | MogaNet-B | ImageNet-1K | 63.4M | 373.1G | MS 3x | 50.3 | 44.4 | config | log / model |
Mask R-CNN | MogaNet-L | ImageNet-1K | 63.4M | 373.1G | MS 3x | 50.6 | 44.6 | config | log / model |
Method | Backbone | Pretrain | Params | FLOPs | Lr schd | box mAP | mask mAP | Config | Download |
---|---|---|---|---|---|---|---|---|---|
Cascade Mask R-CNN | MogaNet-S | ImageNet-1K | 77.9M | 405.4G | MS 3x | 51.4 | 44.9 | config | log / model |
Cascade Mask R-CNN | MogaNet-S | ImageNet-1K | 82.8M | 750.2G | GIOU+MS 3x | 51.7 | 45.1 | config | log / model |
Cascade Mask R-CNN | MogaNet-B | ImageNet-1K | 101.2M | 851.6G | GIOU+MS 3x | 52.6 | 46.0 | config | log / model |
Cascade Mask R-CNN | MogaNet-L | ImageNet-1K | 139.9M | 973.8G | GIOU+MS 3x | 53.3 | 46.1 | config | - |
We provide some demos according to MMDetection. Please use inference_demo or run the following script:
cd demo
python image_demo.py demo.png ../configs/moganet/mask_rcnn_moganet_small_fpn_1x_coco.py ../../work_dirs/checkpoints/mask_rcnn_moganet_small_fpn_1x_coco.pth --out-file pred.png
We train the model on a single node with 8 GPUs (a batch size of 16) by default. Start training with the config as:
PORT=29001 bash dist_train.sh /path/to/config 8
To evaluate the trained model on a single node with 8 GPUs, run:
bash dist_test.sh /path/to/config /path/to/checkpoint 8 --out results.pkl --eval bbox # or `bbox segm`
If you find this repository helpful, please consider citing:
@inproceedings{iclr2024MogaNet,
title={Efficient Multi-order Gated Aggregation Network},
author={Siyuan Li and Zedong Wang and Zicheng Liu and Cheng Tan and Haitao Lin and Di Wu and Zhiyuan Chen and Jiangbin Zheng and Stan Z. Li},
booktitle={International Conference on Learning Representations},
year={2024}
}
Our implementation is mainly based on the following codebases. We gratefully thank the authors for their wonderful works.