Skip to content

Latest commit

 

History

History
177 lines (136 loc) · 53.6 KB

File metadata and controls

177 lines (136 loc) · 53.6 KB

PSPNet

Pyramid Scene Parsing Network

Introduction

Official Repo

Code Snippet

Abstract

Scene parsing is challenging for unrestricted open vocabulary and diverse scenes. In this paper, we exploit the capability of global context information by different-region-based context aggregation through our pyramid pooling module together with the proposed pyramid scene parsing network (PSPNet). Our global prior representation is effective to produce good quality results on the scene parsing task, while PSPNet provides a superior framework for pixel-level prediction tasks. The proposed approach achieves state-of-the-art performance on various datasets. It came first in ImageNet scene parsing challenge 2016, PASCAL VOC 2012 benchmark and Cityscapes benchmark. A single PSPNet yields new record of mIoU accuracy 85.4% on PASCAL VOC 2012 and accuracy 80.2% on Cityscapes.

Citation

@inproceedings{zhao2017pspnet,
  title={Pyramid Scene Parsing Network},
  author={Zhao, Hengshuang and Shi, Jianping and Qi, Xiaojuan and Wang, Xiaogang and Jia, Jiaya},
  booktitle={CVPR},
  year={2017}
}
@article{wightman2021resnet,
  title={Resnet strikes back: An improved training procedure in timm},
  author={Wightman, Ross and Touvron, Hugo and J{\'e}gou, Herv{\'e}},
  journal={arXiv preprint arXiv:2110.00476},
  year={2021}
}

Results and models

Cityscapes

Method Backbone Crop Size Lr schd Mem (GB) Inf time (fps) mIoU mIoU(ms+flip) config download
PSPNet R-50-D8 512x1024 40000 6.1 4.07 77.85 79.18 config model | log
PSPNet R-101-D8 512x1024 40000 9.6 2.68 78.34 79.74 config model | log
PSPNet R-50-D8 769x769 40000 6.9 1.76 78.26 79.88 config model | log
PSPNet R-101-D8 769x769 40000 10.9 1.15 79.08 80.28 config model | log
PSPNet R-18-D8 512x1024 80000 1.7 15.71 74.87 76.04 config model | log
PSPNet R-50-D8 512x1024 80000 - - 78.55 79.79 config model | log
PSPNet R-50b-D8 rsb 512x1024 80000 6.2 3.82 78.47 79.45 config model | log
PSPNet R-101-D8 512x1024 80000 - - 79.76 81.01 config model | log
PSPNet (FP16) R-101-D8 512x1024 80000 5.34 8.77 79.46 - config model | log
PSPNet R-18-D8 769x769 80000 1.9 6.20 75.90 77.86 config model | log
PSPNet R-50-D8 769x769 80000 - - 79.59 80.69 config model | log
PSPNet R-101-D8 769x769 80000 - - 79.77 81.06 config model | log
PSPNet R-18b-D8 512x1024 80000 1.5 16.28 74.23 75.79 config model | log
PSPNet R-50b-D8 512x1024 80000 6.0 4.30 78.22 79.46 config model | log
PSPNet R-101b-D8 512x1024 80000 9.5 2.76 79.69 80.79 config model | log
PSPNet R-18b-D8 769x769 80000 1.7 6.41 74.92 76.90 config model | log
PSPNet R-50b-D8 769x769 80000 6.8 1.88 78.50 79.96 config model | log
PSPNet R-101b-D8 769x769 80000 10.8 1.17 78.87 80.04 config model | log
PSPNet R-50-D32 512x1024 80000 3.0 15.21 73.88 76.85 config model | log
PSPNet R-50b-D32 rsb 512x1024 80000 3.1 16.08 74.09 77.18 config model | log
PSPNet R-50b-D32 512x1024 80000 2.9 15.41 72.61 75.51 config model | log

ADE20K

Method Backbone Crop Size Lr schd Mem (GB) Inf time (fps) mIoU mIoU(ms+flip) config download
PSPNet R-50-D8 512x512 80000 8.5 23.53 41.13 41.94 config model | log
PSPNet R-101-D8 512x512 80000 12 15.30 43.57 44.35 config model | log
PSPNet R-50-D8 512x512 160000 - - 42.48 43.44 config model | log
PSPNet R-101-D8 512x512 160000 - - 44.39 45.35 config model | log

Pascal VOC 2012 + Aug

Method Backbone Crop Size Lr schd Mem (GB) Inf time (fps) mIoU mIoU(ms+flip) config download
PSPNet R-50-D8 512x512 20000 6.1 23.59 76.78 77.61 config model | log
PSPNet R-101-D8 512x512 20000 9.6 15.02 78.47 79.25 config model | log
PSPNet R-50-D8 512x512 40000 - - 77.29 78.48 config model | log
PSPNet R-101-D8 512x512 40000 - - 78.52 79.57 config model | log

Pascal Context

Method Backbone Crop Size Lr schd Mem (GB) Inf time (fps) mIoU mIoU(ms+flip) config download
PSPNet R-101-D8 480x480 40000 8.8 9.68 46.60 47.78 config model | log
PSPNet R-101-D8 480x480 80000 - - 46.03 47.15 config model | log

Pascal Context 59

Method Backbone Crop Size Lr schd Mem (GB) Inf time (fps) mIoU mIoU(ms+flip) config download
PSPNet R-101-D8 480x480 40000 - - 52.02 53.54 config model | log
PSPNet R-101-D8 480x480 80000 - - 52.47 53.99 config model | log

Dark Zurich and Nighttime Driving

We support evaluation results on these two datasets using models above trained on Cityscapes training set.

Method Backbone Training Dataset Test Dataset mIoU config evaluation checkpoint
PSPNet R-50-D8 Cityscapes Training set Dark Zurich 10.91 config model | log
PSPNet R-50-D8 Cityscapes Training set Nighttime Driving 23.02 config model | log
PSPNet R-50-D8 Cityscapes Training set Cityscapes Validation set 77.85 config model | log
PSPNet R-101-D8 Cityscapes Training set Dark Zurich 10.16 config model | log
PSPNet R-101-D8 Cityscapes Training set Nighttime Driving 20.25 config model | log
PSPNet R-101-D8 Cityscapes Training set Cityscapes Validation set 78.34 config model | log
PSPNet R-101b-D8 Cityscapes Training set Dark Zurich 15.54 config model | log
PSPNet R-101b-D8 Cityscapes Training set Nighttime Driving 22.25 config model | log
PSPNet R-101b-D8 Cityscapes Training set Cityscapes Validation set 79.69 config model | log

COCO-Stuff 10k

Method Backbone Crop Size Lr schd Mem (GB) Inf time (fps) mIoU mIoU(ms+flip) config download
PSPNet R-50-D8 512x512 20000 9.6 20.5 35.69 36.62 config model | log
PSPNet R-101-D8 512x512 20000 13.2 11.1 37.26 38.52 config model | log
PSPNet R-50-D8 512x512 40000 - - 36.33 37.24 config model | log
PSPNet R-101-D8 512x512 40000 - - 37.76 38.86 config model | log

COCO-Stuff 164k

Method Backbone Crop Size Lr schd Mem (GB) Inf time (fps) mIoU mIoU(ms+flip) config download
PSPNet R-50-D8 512x512 80000 9.6 20.5 38.80 39.19 config model | log
PSPNet R-101-D8 512x512 80000 13.2 11.1 40.34 40.79 config model | log
PSPNet R-50-D8 512x512 160000 - - 39.64 39.97 config model | log
PSPNet R-101-D8 512x512 160000 - - 41.28 41.66 config model | log
PSPNet R-50-D8 512x512 320000 - - 40.53 40.75 config model | log
PSPNet R-101-D8 512x512 320000 - - 41.95 42.42 config model | log

LoveDA

Method Backbone Crop Size Lr schd Mem (GB) Inf time (fps) mIoU mIoU(ms+flip) config download
PSPNet R-18-D8 512x512 80000 1.45 26.87 48.62 47.57 config model | log
PSPNet R-50-D8 512x512 80000 6.14 6.60 50.46 50.19 config model | log
PSPNet R-101-D8 512x512 80000 9.61 4.58 51.86 51.34 config model | log

Potsdam

Method Backbone Crop Size Lr schd Mem (GB) Inf time (fps) mIoU mIoU(ms+flip) config download
PSPNet R-18-D8 512x512 80000 1.50 85.12 77.09 78.30 config model | log
PSPNet R-50-D8 512x512 80000 6.14 30.21 78.12 78.98 config model | log
PSPNet R-101-D8 512x512 80000 9.61 19.40 78.62 79.47 config model | log

Vaihingen

Method Backbone Crop Size Lr schd Mem (GB) Inf time (fps) mIoU mIoU(ms+flip) config download
PSPNet R-18-D8 512x512 80000 1.45 85.06 71.46 73.36 config model | log
PSPNet R-50-D8 512x512 80000 6.14 30.29 72.36 73.75 config model | log
PSPNet R-101-D8 512x512 80000 9.61 19.97 72.61 74.18 config model | log

iSAID

Method Backbone Crop Size Lr schd Mem (GB) Inf time (fps) mIoU mIoU(ms+flip) config download
PSPNet R-18-D8 896x896 80000 4.52 26.91 60.22 61.25 config model | log
PSPNet R-50-D8 896x896 80000 16.58 8.88 65.36 66.48 config model | log

Note:

  • FP16 means Mixed Precision (FP16) is adopted in training.
  • 896x896 is the Crop Size of iSAID dataset, which is followed by the implementation of PointFlow: Flowing Semantics Through Points for Aerial Image Segmentation
  • rsb is short for 'Resnet strikes back'.
  • The b in R-50b means ResNetV1b, which is a standard ResNet backbone. In MMSegmentation, default backbone is ResNetV1c, which usually performs better in semantic segmentation task.