Panoptic segmentation is a scene understanding problem that combines the prediction from both instance and semantic segmentation into a general unified output. This project implements a location-based panoptic segmentation model, modifying the state-of-the-art EfficientPS architecture by using SOLOv2 as the instance segmentation head instead of a Mask-RCNN.
- Linux
- Python 3.7
- PyTorch 1.7
- CUDA 10.2
- GCC 7 or 8
Install the following frameworks
- EfficientNet-Pytorch for the backbone
- detectron2 for the instance head
- In-Place Activated BatchNorm
- COCO 2018 Panoptic Segmentation Task API (Beta version) to compute panoptic quality metric
- Install Albumentation
pip install -U albumentations
- Install Pytorch lighting
pip install pytorch-lightning
- Install Inplace batchnorm
pip install inplace-abn
- Install EfficientNet Pytorch
pip install efficientnet_pytorch
- Install Detecron 2 dependencies
python -m pip install 'git+https://github.com/facebookresearch/detectron2.git'
- Install Panoptic api
pip install git+https://github.com/cocodataset/panopticapi.git
Install the dependencies by running
pip install pycocotools
pip install numpy
pip install scipy
pip install torch==1.5.1 torchvision==0.6.1
pip install mmcv
- Download the GtFine and leftimg8bit files of the Cityscapes dataset from https://www.cityscapes-dataset.com/ and unzip the
leftImg8bit_trainvaltest.zip
andgtFine_trainvaltest.zip
intodata/cityscapes
- The dataset needs to be converted into coco format using the conversion tool in mmdetection:
- Clone the repository using
git clone https://github.com/open-mmlab/mmdetection.git
- Enter the repository using
cd mmdetection
- Install cityscapescripts using
pip install cityscapesscripts
- Run the script as
python tools/dataset_converters/cityscapes.py \
data/cityscapes/ \
--nproc 8 \
--out-dir data/cityscapes/annotations
- Create the panoptic images json file:
- Clone the repository using
git clone https://github.com/mcordts/cityscapesScripts.git
- Install it using
pip install git+https://github.com/mcordts/cityscapesScripts.git
- Run the script using
python cityscapesScripts/cityscapesscripts/preparation/createPanopticImgs.py
Now the folder structure for the dataset should look as follows:
EfficientPS
└── data
└── cityscapes
├── annotations
├── train
├── cityscapes_panoptic_val.json
└── val
- Go into the SOLOv2 folder using
cd SOLOv2
- Modify
config.yaml
to change the paths - Run
python setup.py develop
- Run
train.py
- Go into the SOLOv2 folder using
cd ..
andcd EfficientPS
- Run
train_net.py
- Go into the SOLOv2 folder using
cd SOLOv2
- Run
python eval.py
. This will save the SOLOv2 masks inEfficientPS/solo_outputs
- Now go into the EfficientPS folder using
cd ..
andcd EfficientPS
- Run the combined evaluation using
python solo_fusion.py
The results will be saved in EfficientPS/Outputs
The original EfficientPS paper: here
Code from the authors of EfficientPS: here
Early research explored various techniques for Instance segmentation and Semantic segmentation separately. Initial panoptic segmentation methods heuristically combine predictions from state-of-the-art instance segmentation network and semantic segmentation network in a post-processing step. However, they suffered from large computational overhead, redundancy in learning and discrepancy between the predictions of each network.
Recent works implemented top-down manner with shared components or in a bottom-up manner sequentially. This again did not utilize component sharing and suffered from low computational efficiency, slow runtimes and subpar results.
EfficientPS:
- Shared backbone: EfficientNet
- Feature aligning semantic head, modified Mask R-CNN
- Panoptic fusion module: dynamic fusion of logits based on mask confidences
- Jointly optimized end-to-end, Depth-wise separable conv, Leaky ReLU
- 2 way FPN : semantically rich multiscale features
We replace the Mask-RCNN architecture from the instance head with a SOLOv2 architecture in order to improve the instance segmentation of the EfficientPS model.
The Mask-RCNN losses now will be replaced by SOLOv2’s Focal Loss for semantic category classification and DiceLoss for mask prediction.
This approach of using a location-based instance segmentation for panoptic segmentation will improve upon the performance metrics.