Skip to content

Commit

Permalink
add demo codes
Browse files Browse the repository at this point in the history
  • Loading branch information
huidingai committed Jul 19, 2023
1 parent 52b83d8 commit ffcb6e4
Show file tree
Hide file tree
Showing 11 changed files with 487 additions and 33 deletions.
68 changes: 35 additions & 33 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/polyformer-referring-image-segmentation-as/referring-expression-comprehension-on-refcoco-1)](https://paperswithcode.com/sota/referring-expression-comprehension-on-refcoco-1?p=polyformer-referring-image-segmentation-as)


\[[Project Page](https://polyformer.github.io/)\] \[[Paper](https://arxiv.org/abs/2302.07387)\] \[[Demo](https://huggingface.co/spaces/koajoel/PolyFormer)\]
\[[Project Page](https://polyformer.github.io/)\] \[[Paper](https://arxiv.org/abs/2302.07387)\]

by [Jiang Liu*](https://joellliu.github.io/), [Hui Ding*](http://www.huiding.org/), [Zhaowei Cai](https://zhaoweicai.github.io/), [Yuting Zhang](https://scholar.google.com/citations?user=9UfZJskAAAAJ&hl=en), [Ravi Kumar Satzoda](https://scholar.google.com.sg/citations?user=4ngycwIAAAAJ&hl=en), [Vijay Mahadevan](https://scholar.google.com/citations?user=n9fRgvkAAAAJ&hl=en), [R. Manmatha](https://ciir.cs.umass.edu/~manmatha/).

Expand All @@ -28,14 +28,10 @@ PolyFormer is a unified model for referring image segmentation (polygon vertex s
```bash
conda create -n polyformer python=3.7.4
conda activate polyformer
pip3 install torch==1.8.1 torchvision==0.9.1 --extra-index-url https://download.pytorch.org/whl/cu113
python -m pip install -r requirements.txt
```
Note: if you are getting import errors from `fairseq`, try the following:
```bash
python -m pip install pip==21.2.4
pip uninstall fairseq
pip install -r requirements.txt
```


## Datasets
### Prepare Pretraining Data
Expand All @@ -56,11 +52,11 @@ The workspace directory should be organized like this:
PolyFormer/
├── datasets/
│   ├── images
│   │   ├── flickr30k/*.jpg
│   │   ├── flickr30k/
│   │   ├── mscoco/
│   │ │  └── train2014/*.jpg
│   │   ├── saiaprtc12/*.jpg
│   │   └── visual-genome/*.jpg
│   │ │  └── train2014/
│   │   ├── saiaprtc12/
│   │   └── visual-genome/
│   └── annotations
│      └── instances.json
└── ...
Expand All @@ -85,12 +81,12 @@ python data/create_finetuning_data.py
## Pretraining
1. Create the checkpoints folder
```bash
mkdir weights
mkdir pretrained_weights
```
2. Download pretrain weights of [Swin-base](https://github.com/SwinTransformer/storage/releases/download/v1.0.0/swin_base_patch4_window12_384_22k.pth),
[Swin-large](https://github.com/SwinTransformer/storage/releases/download/v1.0.0/swin_large_patch4_window12_384_22k.pth),
[BERT-base](https://cdn.huggingface.co/bert-base-uncased-pytorch_model.bin)
and put the weight files in `./pretrained_weights`.
BERT-base
and put the `pth` files in `./pretrained_weights`.
These weights are needed for training to initialize the model.


Expand Down Expand Up @@ -126,32 +122,38 @@ bash evaluate_polyformer_l_refcoco+.sh
bash evaluate_polyformer_l_refcocog.sh
```


## Model Zoo
Download the model weights to `./weights` if you want to use our trained models for finetuning and evaluation.
| | Refcoco val|| | Refcoco testA|| | Refcoco testB| ||
|------------|------|------|------|------|------|------|------|------|------|
| Model | oIoU | mIoU |[email protected]| oIoU | mIoU |[email protected] | oIoU | mIoU |[email protected] |
|PolyFormerB [Checkpoint](https://drive.google.com/file/d/1K0y-WBO6cL7gBzNnJaHAeNu3pgq4DbJ9/view?usp=share_link) | 74.82| 75.96| 89.73|76.64| 77.09 | 91.73| 71.06| 73.22 | 86.03 |
|PolyFormerL [Checkpoint](https://drive.google.com/file/d/15P6m5RI6HAQE2QXQXMAjw_oBsaPii7b3/view?usp=share_link) | 75.96| 76.94| 90.38|78.29| 78.49 | 92.89| 73.25| 74.83 | 87.16|

| | Refcoco val| | | Refcoco testA| | | Refcoco testB| ||
|-------------------------------------------------------------------------------------------------------|------|------|---------|------|-------|------|-----|------|------|
| Model | oIoU | mIoU | [email protected] | oIoU | mIoU |[email protected] | oIoU | mIoU |[email protected] |
| [PolyFormer-B](https://drive.google.com/file/d/1K0y-WBO6cL7gBzNnJaHAeNu3pgq4DbJ9/view?usp=share_link) | 74.82| 75.96 | 89.73 |76.64| 77.09 | 91.73| 71.06| 73.22 | 86.03 |
| [PolyFormer-L](https://drive.google.com/file/d/15P6m5RI6HAQE2QXQXMAjw_oBsaPii7b3/view?usp=share_link) | 75.96| 76.94 | 90.38 |78.29| 78.49 | 92.89| 73.25| 74.83 | 87.16|

| | Refcoco+ val| || Refcoco+ testA| || Refcoco+ testB| ||
|------------|------|------|------|------|------|------|------|------|------|
| Model | oIoU | mIoU |[email protected]| oIoU | mIoU |[email protected] | oIoU | mIoU |[email protected] |
|PolyFormerB [Checkpoint](https://drive.google.com/file/d/12_ylFhsbqGySxDqgeEByn8nKoJtT2n2w/view?usp=share_link) | 67.64| 70.65 | 83.73 | 72.89| 74.51 | 88.60 | 59.33| 64.64 | 76.38 | 67.76| 69.36 |
|PolyFormerL [Checkpoint](https://drive.google.com/file/d/1lUCv7dUPctEz4vEpPr7aI8A8ZmfYCB8y/view?usp=share_link) | 69.33| 72.15 | 84.98 | 74.56| 75.71 | 89.77 | 61.87| 66.73 | 77.97 | 69.20| 71.15 |

| | Refcoco+ val| | | Refcoco+ testA| | | Refcoco+ testB| ||
|--------------------------------------------------------------------------------------------------------|------|------|------|------|------|------|------|------|------|
| Model | oIoU | mIoU |[email protected]| oIoU | mIoU |[email protected] | oIoU | mIoU |[email protected] |
| [PolyFormer-B ](https://drive.google.com/file/d/12_ylFhsbqGySxDqgeEByn8nKoJtT2n2w/view?usp=share_link) | 67.64| 70.65 | 83.73 | 72.89| 74.51 | 88.60 | 59.33| 64.64 | 76.38 | 67.76| 69.36 |
| [PolyFormer-L](https://drive.google.com/file/d/1lUCv7dUPctEz4vEpPr7aI8A8ZmfYCB8y/view?usp=share_link) | 69.33| 72.15 | 84.98 | 74.56| 75.71 | 89.77 | 61.87| 66.73 | 77.97 | 69.20| 71.15 |

| | Refcocog val| || Refcocog test| ||
|------------|------|------|------|------|------|------|
| Model | oIoU | mIoU |[email protected] | oIoU | mIoU |[email protected] |
|PolyFormerB [Checkpoint](https://drive.google.com/file/d/12_ylFhsbqGySxDqgeEByn8nKoJtT2n2w/view?usp=share_link) | 67.76| 69.36 | 84.46| 69.05| 69.88 | 84.96 |
|PolyFormerL [Checkpoint](https://drive.google.com/file/d/1lUCv7dUPctEz4vEpPr7aI8A8ZmfYCB8y/view?usp=share_link) | 69.20| 71.15 | 85.83 | 70.19| 71.17 | 85.91|

| | Refcocog val| || | Refcocog test| |
|-------------------------------------------------------------------------------------------------------|------|------|------|------|------|------|
| Model | oIoU | mIoU |[email protected] | oIoU | mIoU |[email protected] |
| [PolyFormer-B](https://drive.google.com/file/d/1am7SKADCJgdOoXcd6z5JNEB3dHlabraA/view?usp=share_link) | 67.76| 69.36 | 84.46| 69.05| 69.88 | 84.96 |
| [PolyFormer-L](https://drive.google.com/file/d/1upjK4YmtQT9b6qcA3yj3DXKnOuI52Pxv/view?usp=share_link) | 69.20| 71.15 | 85.83 | 70.19| 71.17 | 85.91|
* PolyFormerB Pretrain [Checkpoint](https://drive.google.com/file/d/1sAzfChYDdHdaeatB2K14lrJjG4uiXAol/view?usp=share_link)
* PolyFormerL Pretrain [Checkpoint](https://drive.google.com/file/d/1knRxgM1lmEkuZZ-cOm_fmwKP1H0bJGU9/view?usp=share_link)


## Run the demo
You can run the demo locally by:
```bash
python app.py
```

* Pretrained weights:
* [PolyFormer-B](https://drive.google.com/file/d/1sAzfChYDdHdaeatB2K14lrJjG4uiXAol/view?usp=share_link)
* [PolyFormer-L](https://drive.google.com/file/d/1knRxgM1lmEkuZZ-cOm_fmwKP1H0bJGU9/view?usp=share_link)

# Acknowlegement
This codebase is developed based on [OFA](https://github.com/OFA-Sys/OFA).
Expand Down
47 changes: 47 additions & 0 deletions app.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
# https://huggingface.co/koajoel/PolyFormer
import os
import torch
import numpy as np
from fairseq import utils,tasks
from utils.checkpoint_utils import load_model_ensemble_and_task
from models.polyformer import PolyFormerModel
import cv2

import torch
import numpy as np
from fairseq import utils, tasks
from fairseq import checkpoint_utils
from utils.eval_utils import eval_step
from tasks.refcoco import RefcocoTask
from models.polyformer import PolyFormerModel
from PIL import Image
from torchvision import transforms
import cv2
import gradio as gr
import math
from io import BytesIO
import base64
import re
from demo import visual_grounding

title = "PolyFormer for Visual Grounding"

description = """<p style='text-align: center'> <a href='https://polyformer.github.io/' target='_blank'>Project Page</a> | <a href='https://arxiv.org/pdf/2302.07387.pdf' target='_blank'>Paper</a> | <a href='https://github.com/amazon-science/polygon-transformer' target='_blank'>Github Repo</a></p>
<p style='text-align: left'> Demo of PolyFormer for referring image segmentation and referring expression comprehension. Upload your own image or click any one of the examples, and write a description about a certain object. Then click \"Submit\" and wait for the results.</p>
"""

examples = [['demo/vases.jpg', 'the blue vase on the left'],
['demo/dog.jpg', 'the dog wearing glasses'],
['demo/bear.jpeg', 'a bear astronaut in the space'],
['demo/unicorn.jpeg', 'a unicorn doing computer vision research'],
['demo/pig.jpeg', 'a pig robot preparing a delicious meal'],
['demo/otta.png', 'a gentleman otter in a 19th century portrait'],
['demo/pikachu.jpeg', 'a pikachu fine-dining with a view to the Eiffel Tower'],
['demo/cabin.jpeg', 'a small cabin on top of a snowy mountain in the style of Disney art station']
]
io = gr.Interface(fn=visual_grounding, inputs=[gr.inputs.Image(type='pil'), "textbox"],
outputs=[gr.outputs.Image(label="output", type='numpy'), gr.outputs.Image(label="predicted mask", type='numpy')],
title=title, description=description, examples=examples,
allow_flagging=False, allow_screenshot=False, cache_examples=False)
io.launch(share=True)

Loading

0 comments on commit ffcb6e4

Please sign in to comment.