[Project Page] [Paper] [Demo]
by Jiang Liu*, Hui Ding*, Zhaowei Cai, Yuting Zhang, Ravi Kumar Satzoda, Vijay Mahadevan, R. Manmatha.
Contributions:
- State-of-the-art results on referring image segmentation and referring expression comprehension on 6 datasets;
- A unified framework for referring image segmentation (RIS) and referring expression comprehension (REC) by formulating them as a sequence-to-sequence (seq2seq) prediction problem;
- A regression-based decoder for accurate coordinate prediction, which outputs continuous 2D coordinates directly without quantization error..
conda create -n polyformer python=3.7.4
conda activate polyformer
python -m pip install -r requirements.txt
Note: if you are getting import errors from fairseq
, try the following:
python -m pip install pip==21.2.4
pip uninstall fairseq
pip install -r requirements.txt
- Create the dataset folders
mkdir datasets
mkdir datasets/images
mkdir datasets/annotations
- Download the 2014 Train images [83K/13GB] from COCO,
original Flickr30K images,
ReferItGame images,
and Visual Genome images, and extract them to
datasets/images
. - Download the annotation file for pretraining datasets instances.json
provided by SeqTR and store it in
datasets/annotations
. The workspace directory should be organized like this:
PolyFormer/
├── datasets/
│ ├── images
│ │ ├── flickr30k/*.jpg
│ │ ├── mscoco/
│ │ │ └── train2014/*.jpg
│ │ ├── saiaprtc12/*.jpg
│ │ └── visual-genome/*.jpg
│ └── annotations
│ └── instances.json
└── ...
- Generate the tsv files for pretraining
python data/create_pretraining_data.py
-
Follow the instructions in the
./refer
directory to set up subdirectories and download annotations. This directory is based on the refer API. -
Generate the tsv files for finetuning
python data/create_finetuning_data.py
- Create the checkpoints folder
mkdir pretrained_weights
-
Download pretrain weights of Swin-base, Swin-large, BERT-base and put the weight files in
./pretrained_weights
. These weights are needed for training to initialize the model. -
Run the pretraining scripts for model pretraining on the referring expression comprehension task:
cd run_scripts/pretrain
bash pretrain_polyformer_b.sh # for pretraining PolyFormer-B model
bash pretrain_polyformer_l.sh # for pretraining PolyFormer-L model
Run the finetuning scripts for model pretraining on the referring image segmentation and referring expression comprehension tasks:
cd run_scripts/finetune
bash train_polyformer_b.sh # for finetuning PolyFormer-B model
bash train_polyformer_l.sh # for finetuning PolyFormer-L model
Please make sure to link the pretrain weight paths (Line 20) in the finetuning scripts to the best pretraining checkpoints.
Run the evaluation scripts for evaluating on the referring image segmentation and referring expression comprehension tasks:
cd run_scripts/evaluation
# for evaluating PolyFormer-B model
bash evaluate_polyformer_b_refcoco.sh
bash evaluate_polyformer_b_refcoco+.sh
bash evaluate_polyformer_b_refcocog.sh
# for evaluating PolyFormer-L model
bash evaluate_polyformer_l_refcoco.sh
bash evaluate_polyformer_l_refcoco+.sh
bash evaluate_polyformer_l_refcocog.sh
Download the model weights to ./weights
if you want to use our trained models for finetuning and evaluation.
Refcoco val | Refcoco testA | Refcoco testB | |||||||
---|---|---|---|---|---|---|---|---|---|
Model | oIoU | mIoU | [email protected] | oIoU | mIoU | [email protected] | oIoU | mIoU | [email protected] |
PolyFormer-B | 74.82 | 75.96 | 89.73 | 76.64 | 77.09 | 91.73 | 71.06 | 73.22 | 86.03 |
PolyFormer-L | 75.96 | 76.94 | 90.38 | 78.29 | 78.49 | 92.89 | 73.25 | 74.83 | 87.16 |
Refcoco+ val | Refcoco+ testA | Refcoco+ testB | |||||||
---|---|---|---|---|---|---|---|---|---|
Model | oIoU | mIoU | [email protected] | oIoU | mIoU | [email protected] | oIoU | mIoU | [email protected] |
PolyFormer-B | 67.64 | 70.65 | 83.73 | 72.89 | 74.51 | 88.60 | 59.33 | 64.64 | 76.38 |
PolyFormer-L | 69.33 | 72.15 | 84.98 | 74.56 | 75.71 | 89.77 | 61.87 | 66.73 | 77.97 |
Refcocog val | Refcocog test | |||||
---|---|---|---|---|---|---|
Model | oIoU | mIoU | [email protected] | oIoU | mIoU | [email protected] |
PolyFormer-B | 67.76 | 69.36 | 84.46 | 69.05 | 69.88 | 84.96 |
PolyFormer-L | 69.20 | 71.15 | 85.83 | 70.19 | 71.17 | 85.91 |
- Pretrained weights:
You can run the demo locally by:
python app.py
This codebase is developed based on OFA. Other related codebases include:
Please cite our paper if you find this codebase helpful :)
@inproceedings{liu2023polyformer,
title={PolyFormer: Referring Image Segmentation as Sequential Polygon Generation},
author={Liu, Jiang and Ding, Hui and Cai, Zhaowei and Zhang, Yuting and Satzoda, Ravi Kumar and Mahadevan, Vijay and Manmatha, R},
booktitle={CVPR},
year={2023}
}
See CONTRIBUTING for more information.
This project is licensed under the Apache-2.0 License.