Panoptic Segmentation and End-to-End Object Detection using DEtection TRansformer

Objective

The aim of this project is to train DETR on a custom dataset consisting of objects from construction domain (around 48 classes) for Object Detection and Panoptic Segmentation.

Let us now understand how DETR works and try to answer few questions.

Dataset preparation

Click here

Training

First the object detection model was trained for 200 epochs using pre-trained weights. Then a panoptic head was added on top of this and trained for another 50 epochs. This time the object detection model was freezed and only panoptic head was trained.

We train DETR with AdamW setting learning rate in the transformer to 1e-4 and 1e-5 in the backbone. Horizontal flips, scales and crops are used for augmentation. Images are rescaled to have min size 800 and max size 1333. The transformer is trained with dropout of 0.1, and the whole model is trained with grad clip of 0.1.

Fine-tuning of DETR on construction dataset for Object Detection (click here)
Panoptic segmentation training (click here)

Results

Bounding box detection evaluation results for the construction dataset after training for 200 epochs

IoU metric: bbox
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.753
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.864
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.801
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.387
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.609
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.782
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.716
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.857
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.871
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.505
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.728
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.899

Segmentation Metric: (Panoptic, Segmentation, Recognition Quality) after training panoptic head for 50 epochs

          |    PQ     SQ     RQ     N
--------------------------------------
All       |  53.1   80.0   60.7    61
Things    |  61.6   82.9   69.6    46
Stuff     |  27.0   71.2   33.5    15

Check out the below YouTube link below to see predictions from the trained model

Conclusion

The project shows that fine tuning can lead to a score of 53 PQ in about 50 epochs. The results are satisfactory. Transformers are good in global reasoning but are computational expensive with long inputs (high resolution images), making difficult to attain good results with small objects.

Further works include

explore new image augmentation techniques like RICAP for better detection results
reduce leakage of orginial COCO class while creating ground truth. (eg: red areas around wheel loader in image)
add few images from COCO dataset so that PQ for stuff could be increased
Implement Spatially Modulated Co-Attention (SMCA) which is a plug and play module to replace and help achieve faster convergence. Refer this link
Explore and implement this paper from Google which would allow to skip the BBox detection and directly train for Panoptic segmentation.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
Archive		Archive
DatasetCreation		DatasetCreation
Detection		Detection
Images		Images
Part1		Part1
Predictions		Predictions
Segmentation		Segmentation
detr		detr
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Panoptic Segmentation and End-to-End Object Detection using DEtection TRansformer

Objective

Dataset preparation

Training

Results

Conclusion

References

About

Releases

Packages

Languages

gokul-pv/PanopticSegmentation

Folders and files

Latest commit

History

Repository files navigation

Panoptic Segmentation and End-to-End Object Detection using DEtection TRansformer

Objective

Dataset preparation

Training

Results

Conclusion

References

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages