This is an official MXNet implementation of Sequence Level Semantics Aggregation for Video Object Detection. (ICCV 2019, oral). SELSA aggregates full-sequence level information of videos while keeping a simple and clean pipeline. It achieves 82.69 mAP with ResNet-101 on ImageNet VID validation set.
If you use the code or models in your research, please cite with:
@article{wu2019selsa,
title={Sequence Level Semantics Aggregation for Video Object Detection},
author={Wu, Haiping and Chen, Yuntao and Wang, Naiyan and Zhang, Zhaoxiang},
journal={ICCV 2019},
year={2019}
}
training data | testing data | mAP(%) | mAP(%) (slow) |
mAP(%) (medium) |
mAP(%) (fast) |
|
---|---|---|---|---|---|---|
Single-frame baseline (Faster R-CNN, ResNet-101) |
ImageNet DET train + VID train |
ImageNet VID validation | 73.6 | 82.1 | 71.0 | 52.5 |
SELSA (Faster R-CNN, ResNet-101) |
ImageNet DET train + VID train |
ImageNet VID validation | 80.3 | 86.9 | 78.9 | 61.4 |
SELSA (Faster R-CNN, ResNet-101, Data Aug) |
ImageNet DET train + VID train |
ImageNet VID validation | 82.7 | 88.0 | 81.4 | 67.1 |
Please note that this repo is based on Python 2.
- Clone the repository.
git clone https://github.com/happywu/Sequence-Level-Semantics-Aggregation
-
Install MXNet following https://mxnet.incubator.apache.org/get_started. We tested our code on MXNet v1.3.0.
-
Install packages via
pip install -r requirements.txt
sh init.sh
-
Please download ILSVRC2015 DET and ILSVRC2015 VID dataset, and make sure it looks like this:
./data/ILSVRC2015/ ./data/ILSVRC2015/Annotations/DET ./data/ILSVRC2015/Annotations/VID ./data/ILSVRC2015/Data/DET ./data/ILSVRC2015/Data/VID ./data/ILSVRC2015/ImageSets
-
Please download ImageNet pre-trained ResNet-v1-101 model and our pretrained SELSA ResNet-101 model manually, and put it under folder
./model
. Make sure it looks like this:./model/pretrained_model/resnet_v1_101-0000.params ./model/pretrained_model/selsa_rcnn_vid-0000.params
- To test the provided pretrained model, run the following command.
python experiments/selsa/test.py --cfg experiments/selsa/cfgs/resnet_v1_101_rcnn_selsa_aug.yaml --test-pretrained ./model/pretrained_model/selsa_rcnn_vid
You should get the results as reported before.
-
To train, use the following command
python experiments/selsa/train_end2end.py --cfg experiments/selsa/cfgs/resnet_v1_101_rcnn_selsa_aug.yaml
A cache folder would be created automatically to save the model and the log under
output/selsa_rcnn/imagenet_vid/
. -
To test your trained model
python experiments/selsa/test.py --cfg experiments/selsa/cfgs/resnet_v1_101_rcnn_selsa_aug.yaml
Pytorch: MMTracking
This repo is modified from Flow-Guided-Feature-Aggregation.