Synthetic RGB-D Fusion (SF) Mask R-CNN for unseen object instance segmentation
S. Back, J. Kim, R. Kang, S. Choi and K. Lee. Segmenting unseen industrial components in a heavy clutter using rgb-d fusion and synthetic data. 2020 IEEE International Conference on Image Processing (ICIP). IEEE, 2020. [Paper] [Video]
Unseen object instance segmentation performance on WISDOM dataset
Method | Input | Use Synthetic Data | Backbone | mask AP | box AP | Reference |
---|---|---|---|---|---|---|
SD Mask R-CNN | Depth | Yes (WISDOM) | ResNet-35-FPN | 51.6 | - | Danielczuk et al. |
Mask R-CNN | RGB | No | ResNet-35-FPN | 38.4 | - | Danielczuk et al. |
Mask R-CNN | RGB | No | ResNet-50-FPN | 40.1 | 36.7 | Ito et al. |
D-SOLO | RGB | No | ResNet-50-FPN | 42.0 | 39.1 | Ito et al. |
PPIS | RGB | No | ResNet-50-FPN | 52.3 | 48.1 | Ito et al. |
Mask R-CNN | RGB | Yes (Ours) | ResNet-50-FPN | 59.0 | 61.4 | Ours |
Mask R-CNN | Depth | Yes (Ours) | ResNet-50-FPN | 59.6 | 60.4 | Ours |
SF Mask R-CNN (early fusion) | RGB-Depth | Yes (Ours) | ResNet-50-FPN | 55.5 | 57.2 | Ours |
SF Mask R-CNN (late fusion) | RGB-Depth | Yes (Ours) | ResNet-50-FPN | 58.7 | 59.0 | Ours |
SF Mask R-CNN (confidence fusion) | RGB-Depth | Yes (Ours) | ResNet-50-FPN | 60.5 | 61.0 | Ours |
SF Mask R-CNN is an upgraded version of RGB-D fusion Mask R-CNN with a confidence map estimator [1]. The main differences from [1] are
- SF Mask R-CNN generates a self-attention map from RGB and inpainted depth (validity mask and raw depth were used in [1])
- This self-attention map is used as a confidence map; Thus, RGB and depth feature maps fused with spatial self-attention in four different scales.
- It was fined-tuned on WISDOM-REAL-Train (100 images) and evaluated on public unseen object instance segmentation dataset, WISDOM (The only custom industrial dataset was used previously)
- SF Mask R-CNN has been released (2020/02/18)
- Train dataset has been released (2022/05/16)
- Setup anaconda environment
$ conda create -n sfmaskrcnn python=3.7
$ conda activate sfmaskrcnn
$ pip install torch torchvision
$ pip install imgviz tqdm tensorboardX pandas opencv-python imutils pyfastnoisesimd scikit-image pycocotools
$ pip install pyrealsense2 # for demo
$ conda activate sfmaskrcnn
- Download the provided SF Mask R-CNN weights pre-trained on our custom dataset.
-
Download the WISDOM-Real dataset [Link]
-
Set the path to the dataset and pretrained weights (You can put this into your bash profile)
$ export WISDOM_PATH={/path/to/the/wisdom-real/high-res/dataset}
$ export WEIGHT_PATH={/path/to/the/pretrained/weights}
-
Download the synthetic train dataset at GDrive
-
Unzip the downloaded dataset, and modify the
dataset_path
of the config file correspondingly. -
To train an SF Mask R-CNN (confidence fusion, RGB-noisy depth as input) on a synthetic dataset.
$ python train.py --gpu 0 --cfg rgb_noisydepth_confidencefusion
- To fine-tune the SF Mask R-CNN on WISDOM dataset
$ python train.py --gpu 0 --cfg rgb_noisydepth_confidencefusion_FT --resume
To evaluate an SF Mask R-CNN (confidence fusion, RGB-noisy depth as input) on a WISDOM dataset
$ python eval.py --gpu 0 --cfg rgb_noisydepth_confidencefusion \
--eval_data wisdom \
--dataset_path $WISDOM_PATH \
--weight_path $WEIGHT_PATH/SFMaskRCNN_ConfidenceFusion.tar
To visualize the inference results of SF Mask R-CNN on a WISDOM dataset
$ python inference.py --gpu 0 --cfg rgb_noisydepth_confidencefusion \
--eval_data wisdom --vis_depth \
--dataset_path $WISDOM_PATH \
--weight_path $WEIGHT_PATH/SFMaskRCNN_ConfidenceFusion.tar
Our custom synthetic dataset
$ python inference.py --gpu 0 --cfg rgb_noisydepth_confidencefusion \
--eval_data synthetic --vis_depth \
--dataset_path examples \
--weight_path $WEIGHT_PATH/SFMaskRCNN_ConfidenceFusion.tar
To run real-time demo with realsense-d435
# SF Mask R-CNN (confidence fusion)
$ python demo.py --cfg rgb_noisydepth_confidencefusion \
--weight_path $WEIGHT_PATH/SFMaskRCNN_ConfidenceFusion.tar
# SF Mask R-CNN (early fusion)
$ python demo.py --cfg rgb_noisydepth_earlyfusion \
--weight_path $WEIGHT_PATH/SFMaskRCNN_EarlyFusion.tar
# SF Mask R-CNN (late fusion)
$ python demo.py --cfg rgb_noisydepth_latefusion \
--weight_path $WEIGHT_PATH/SFMaskRCNN_LateFusion.tar
If you use our work in a research project, please cite our work:
[1] @inproceedings{back2020segmenting,
title={Segmenting unseen industrial components in a heavy clutter using rgb-d fusion and synthetic data},
author={Back, Seunghyeok and Kim, Jongwon and Kang, Raeyoung and Choi, Seungjun and Lee, Kyoobin},
booktitle={2020 IEEE International Conference on Image Processing (ICIP)},
pages={828--832},
year={2020},
organization={IEEE}
}