PolyFormer: Referring Image Segmentation as Sequential Polygon Generation (CVPR 2023)

by Jiang Liu*, Hui Ding*, Zhaowei Cai, Yuting Zhang, Ravi Kumar Satzoda, Vijay Mahadevan, R. Manmatha.

🎶 Introduction

PolyFormer is a unified model for referring image segmentation (polygon vertex sequence) and referring expression comprehension (bounding box corner points). The polygons are converted to segmentation masks in the end.

Contributions:

State-of-the-art results on referring image segmentation and referring expression comprehension on 6 datasets;
A unified framework for referring image segmentation (RIS) and referring expression comprehension (REC) by formulating them as a sequence-to-sequence (seq2seq) prediction problem;
A regression-based decoder for accurate coordinate prediction, which outputs continuous 2D coordinates directly without quantization error..

Getting Started

Installation

conda create -n polyformer python=3.7.4
conda activate polyformer
python -m pip install -r requirements.txt

Note: if you are getting import errors from fairseq, try the following:

python -m pip install pip==21.2.4
pip uninstall fairseq
pip install -r requirements.txt

Datasets

Prepare Pretraining Data

Create the dataset folders

mkdir datasets
mkdir datasets/images
mkdir datasets/annotations

Download the 2014 Train images [83K/13GB] from COCO, original Flickr30K images, ReferItGame images, and Visual Genome images, and extract them to datasets/images.
Download the annotation file for pretraining datasets instances.json provided by SeqTR and store it in datasets/annotations. The workspace directory should be organized like this:

PolyFormer/
├── datasets/
│   ├── images
│   │   ├── flickr30k/*.jpg
│   │   ├── mscoco/
│   │   │   └── train2014/*.jpg
│   │   ├── saiaprtc12/*.jpg
│   │   └── visual-genome/*.jpg
│   └── annotations
│       └── instances.json
└── ...

Generate the tsv files for pretraining

python data/create_pretraining_data.py

Prepare Finetuning Data

Follow the instructions in the ./refer directory to set up subdirectories and download annotations. This directory is based on the refer API.
Generate the tsv files for finetuning

python data/create_finetuning_data.py

Pretraining

Create the checkpoints folder

mkdir pretrained_weights

Download pretrain weights of Swin-base, Swin-large, BERT-base and put the weight files in ./pretrained_weights. These weights are needed for training to initialize the model.
Run the pretraining scripts for model pretraining on the referring expression comprehension task:

cd run_scripts/pretrain
bash pretrain_polyformer_b.sh  # for pretraining PolyFormer-B model
bash pretrain_polyformer_l.sh  # for pretraining PolyFormer-L model

Finetuning

Run the finetuning scripts for model pretraining on the referring image segmentation and referring expression comprehension tasks:

cd run_scripts/finetune
bash train_polyformer_b.sh  # for finetuning PolyFormer-B model
bash train_polyformer_l.sh  # for finetuning PolyFormer-L model

Please make sure to link the pretrain weight paths (Line 20) in the finetuning scripts to the best pretraining checkpoints.

Evaluation

Run the evaluation scripts for evaluating on the referring image segmentation and referring expression comprehension tasks:

cd run_scripts/evaluation

# for evaluating PolyFormer-B model
bash evaluate_polyformer_b_refcoco.sh 
bash evaluate_polyformer_b_refcoco+.sh 
bash evaluate_polyformer_b_refcocog.sh 

# for evaluating PolyFormer-L model
bash evaluate_polyformer_l_refcoco.sh 
bash evaluate_polyformer_l_refcoco+.sh 
bash evaluate_polyformer_l_refcocog.sh

Model Zoo

Download the model weights to ./weights if you want to use our trained models for finetuning and evaluation.

	Refcoco val			Refcoco testA			Refcoco testB
Model	oIoU	mIoU	[email protected]	oIoU	mIoU	[email protected]	oIoU	mIoU	[email protected]
PolyFormer-B	74.82	75.96	89.73	76.64	77.09	91.73	71.06	73.22	86.03
PolyFormer-L	75.96	76.94	90.38	78.29	78.49	92.89	73.25	74.83	87.16

	Refcoco+ val			Refcoco+ testA			Refcoco+ testB
Model	oIoU	mIoU	[email protected]	oIoU	mIoU	[email protected]	oIoU	mIoU	[email protected]
PolyFormer-B	67.64	70.65	83.73	72.89	74.51	88.60	59.33	64.64	76.38
PolyFormer-L	69.33	72.15	84.98	74.56	75.71	89.77	61.87	66.73	77.97

	Refcocog val				Refcocog test
Model	oIoU	mIoU	[email protected]	oIoU	mIoU	[email protected]
PolyFormer-B	67.76	69.36	84.46	69.05	69.88	84.96
PolyFormer-L	69.20	71.15	85.83	70.19	71.17	85.91

Pretrained weights:
- PolyFormer-B
- PolyFormer-L

Run the demo

You can run the demo locally by:

python app.py

Acknowlegement

This codebase is developed based on OFA. Other related codebases include:

Citation

Please cite our paper if you find this codebase helpful :)

@inproceedings{liu2023polyformer,
  title={PolyFormer: Referring Image Segmentation as Sequential Polygon Generation},
  author={Liu, Jiang and Ding, Hui and Cai, Zhaowei and Zhang, Yuting and Satzoda, Ravi Kumar and Mahadevan, Vijay and Manmatha, R},
  booktitle={CVPR},
  year={2023}
}

Security

See CONTRIBUTING for more information.

License

This project is licensed under the Apache-2.0 License.

Name	Name	Last commit message	Last commit date
Latest commit Sangboom feat: bengio polyformer for submit aihub tta Nov 28, 2024 5c27c7a · Nov 28, 2024 History 40 Commits
__pycache__	__pycache__	feat: bengio polyformer for submit aihub tta	Nov 28, 2024
bert	bert	feat: bengio polyformer for submit aihub tta	Nov 28, 2024
criterions	criterions	feat: bengio polyformer for submit aihub tta	Nov 28, 2024
data	data	feat: bengio polyformer for submit aihub tta	Nov 28, 2024
demo	demo	add demo codes	Jul 19, 2023
fairseq	fairseq	initial code check-in	May 12, 2023
models	models	feat: bengio polyformer for submit aihub tta	Nov 28, 2024
polyformer_module	polyformer_module	feat: bengio polyformer for submit aihub tta	Nov 28, 2024
refer	refer	feat: bengio polyformer for submit aihub tta	Nov 28, 2024
run_scripts	run_scripts	feat: bengio polyformer for submit aihub tta	Nov 28, 2024
src	src	Feat: add pretrain	Oct 31, 2024
tasks	tasks	feat: bengio polyformer for submit aihub tta	Nov 28, 2024
utils	utils	feat: bengio polyformer for submit aihub tta	Nov 28, 2024
.gitignore	.gitignore	manufact uniq	Nov 11, 2024
CODE_OF_CONDUCT.md	CODE_OF_CONDUCT.md	Initial commit	May 12, 2023
CONTRIBUTING.md	CONTRIBUTING.md	Initial commit	May 12, 2023
Dockerfile	Dockerfile	feat: bengio polyformer for submit aihub tta	Nov 28, 2024
LICENSE	LICENSE	initial code check-in	May 12, 2023
NOTICE	NOTICE	Initial commit	May 12, 2023
README.md	README.md	update readme	Jul 19, 2023
README_22-38.md	README_22-38.md	feat: bengio polyformer for submit aihub tta	Nov 28, 2024
README_22-39.md	README_22-39.md	feat: bengio polyformer for submit aihub tta	Nov 28, 2024
app.py	app.py	add demo codes	Jul 19, 2023
convert_aihub_indoor.py	convert_aihub_indoor.py	feat: bengio polyformer for submit aihub tta	Nov 28, 2024
convert_aihub_manufact.py	convert_aihub_manufact.py	feat: bengio polyformer for submit aihub tta	Nov 28, 2024
demo.py	demo.py	update to fix bbox error	Nov 7, 2024
evaluate.py	evaluate.py	initial code check-in	May 12, 2023
pipeline.gif	pipeline.gif	initial code check-in	May 12, 2023
requirements.txt	requirements.txt	Feat: make aihub working	Oct 27, 2024
requirements_aws_poly.txt	requirements_aws_poly.txt	Feat: add pretrain	Oct 31, 2024
train.py	train.py	initial code check-in	May 12, 2023
trainer.py	trainer.py	initial code check-in	May 12, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PolyFormer: Referring Image Segmentation as Sequential Polygon Generation (CVPR 2023)

🎶 Introduction

Getting Started

Installation

Datasets

Prepare Pretraining Data

Prepare Finetuning Data

Pretraining

Finetuning

Evaluation

Model Zoo

Run the demo

Acknowlegement

Citation

Security

License

About

Releases

Packages

Languages

License

gist-ailab/AIHub-polygon-transformer

Folders and files

Latest commit

History

Repository files navigation

PolyFormer: Referring Image Segmentation as Sequential Polygon Generation (CVPR 2023)

🎶 Introduction

Getting Started

Installation

Datasets

Prepare Pretraining Data

Prepare Finetuning Data

Pretraining

Finetuning

Evaluation

Model Zoo

Run the demo

Acknowlegement

Citation

Security

License

About

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages