add demo codes

gist-ailab · Jul 19, 2023 · ffcb6e4 · ffcb6e4
1 parent 52b83d8
commit ffcb6e4
Show file tree

Hide file tree

Showing 11 changed files with 487 additions and 33 deletions.
diff --git a/README.md b/README.md
@@ -6,7 +6,7 @@
 [![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/polyformer-referring-image-segmentation-as/referring-expression-comprehension-on-refcoco-1)](https://paperswithcode.com/sota/referring-expression-comprehension-on-refcoco-1?p=polyformer-referring-image-segmentation-as)
 
 
-\[[Project Page](https://polyformer.github.io/)\]   \[[Paper](https://arxiv.org/abs/2302.07387)\]   \[[Demo](https://huggingface.co/spaces/koajoel/PolyFormer)\]
+\[[Project Page](https://polyformer.github.io/)\]   \[[Paper](https://arxiv.org/abs/2302.07387)\]   
 
 by [Jiang Liu*](https://joellliu.github.io/), [Hui Ding*](http://www.huiding.org/), [Zhaowei Cai](https://zhaoweicai.github.io/),  [Yuting Zhang](https://scholar.google.com/citations?user=9UfZJskAAAAJ&hl=en), [Ravi Kumar Satzoda](https://scholar.google.com.sg/citations?user=4ngycwIAAAAJ&hl=en), [Vijay Mahadevan](https://scholar.google.com/citations?user=n9fRgvkAAAAJ&hl=en), [R. Manmatha](https://ciir.cs.umass.edu/~manmatha/).
 
@@ -28,14 +28,10 @@ PolyFormer is a unified model for referring image segmentation (polygon vertex s
 ```bash
 conda create -n polyformer python=3.7.4
 conda activate polyformer
+pip3 install torch==1.8.1 torchvision==0.9.1 --extra-index-url https://download.pytorch.org/whl/cu113
 python -m pip install -r requirements.txt
 ```
-Note: if you are getting import errors from `fairseq`, try the following:
-```bash
-python -m pip install pip==21.2.4
-pip uninstall fairseq
-pip install -r requirements.txt
-```
+
 
 ## Datasets 
 ### Prepare Pretraining Data
@@ -56,11 +52,11 @@ The workspace directory should be organized like this:
 PolyFormer/
 ├── datasets/
 │   ├── images
-│   │   ├── flickr30k/*.jpg
+│   │   ├── flickr30k/
 │   │   ├── mscoco/
-│   │   │   └── train2014/*.jpg
-│   │   ├── saiaprtc12/*.jpg
-│   │   └── visual-genome/*.jpg
+│   │   │   └── train2014/
+│   │   ├── saiaprtc12/
+│   │   └── visual-genome/
 │   └── annotations
 │       └── instances.json
 └── ...
@@ -85,12 +81,12 @@ python data/create_finetuning_data.py
 ## Pretraining
 1. Create the checkpoints folder
 ```bash
-mkdir weights
+mkdir pretrained_weights
 ```
 2. Download pretrain weights of [Swin-base](https://github.com/SwinTransformer/storage/releases/download/v1.0.0/swin_base_patch4_window12_384_22k.pth),
 [Swin-large](https://github.com/SwinTransformer/storage/releases/download/v1.0.0/swin_large_patch4_window12_384_22k.pth),
-[BERT-base](https://cdn.huggingface.co/bert-base-uncased-pytorch_model.bin)
-and put the weight files in `./pretrained_weights`.
+BERT-base
+and put the `pth` files in `./pretrained_weights`.
 These weights are needed for training to initialize the model.
 
 
@@ -126,32 +122,38 @@ bash evaluate_polyformer_l_refcoco+.sh
 bash evaluate_polyformer_l_refcocog.sh 
 ```
 
+
 ## Model Zoo
-Download the model weights to `./weights` if you want to use our trained models for finetuning and evaluation.
+|            | Refcoco val|| | Refcoco testA|| | Refcoco testB| ||
+|------------|------|------|------|------|------|------|------|------|------|
+|    Model   | oIoU | mIoU |[email protected]| oIoU | mIoU  |[email protected] | oIoU | mIoU  |[email protected]  | 
+|PolyFormerB [Checkpoint](https://drive.google.com/file/d/1K0y-WBO6cL7gBzNnJaHAeNu3pgq4DbJ9/view?usp=share_link)  | 74.82| 75.96| 89.73|76.64| 77.09 | 91.73| 71.06| 73.22 | 86.03 | 
+|PolyFormerL [Checkpoint](https://drive.google.com/file/d/15P6m5RI6HAQE2QXQXMAjw_oBsaPii7b3/view?usp=share_link)  | 75.96| 76.94| 90.38|78.29| 78.49 | 92.89| 73.25| 74.83 |  87.16| 
 
-|                                                                                                       | Refcoco val|           | | Refcoco testA|       |       | Refcoco testB| ||
-|-------------------------------------------------------------------------------------------------------|------|------|---------|------|-------|------|-----|------|------|
-| Model                                                                                                 | oIoU | mIoU | [email protected] | oIoU | mIoU  |[email protected] | oIoU | mIoU  |[email protected]  | 
-| [PolyFormer-B](https://drive.google.com/file/d/1K0y-WBO6cL7gBzNnJaHAeNu3pgq4DbJ9/view?usp=share_link) | 74.82| 75.96 | 89.73   |76.64| 77.09 | 91.73| 71.06| 73.22 | 86.03 | 
-| [PolyFormer-L](https://drive.google.com/file/d/15P6m5RI6HAQE2QXQXMAjw_oBsaPii7b3/view?usp=share_link) | 75.96| 76.94 | 90.38   |78.29| 78.49 | 92.89| 73.25| 74.83 |  87.16| 
 
+|            | Refcoco+ val| || Refcoco+ testA| || Refcoco+ testB| ||
+|------------|------|------|------|------|------|------|------|------|------|
+|    Model   | oIoU | mIoU |[email protected]| oIoU | mIoU  |[email protected] | oIoU | mIoU  |[email protected]  | 
+|PolyFormerB [Checkpoint](https://drive.google.com/file/d/12_ylFhsbqGySxDqgeEByn8nKoJtT2n2w/view?usp=share_link)  |  67.64| 70.65 | 83.73 | 72.89| 74.51 | 88.60 | 59.33| 64.64 | 76.38 | 67.76| 69.36  | 
+|PolyFormerL [Checkpoint](https://drive.google.com/file/d/1lUCv7dUPctEz4vEpPr7aI8A8ZmfYCB8y/view?usp=share_link)  |  69.33| 72.15 | 84.98 | 74.56| 75.71 | 89.77 | 61.87| 66.73 | 77.97 | 69.20| 71.15 | 
 
-|                                                                                                      | Refcoco+ val|           | | Refcoco+ testA|       |       | Refcoco+ testB| ||
-|--------------------------------------------------------------------------------------------------------|------|------|------|------|------|------|------|------|------|
-| Model                                                                                                  | oIoU | mIoU |[email protected]| oIoU | mIoU  |[email protected] | oIoU | mIoU  |[email protected]  | 
-| [PolyFormer-B ](https://drive.google.com/file/d/12_ylFhsbqGySxDqgeEByn8nKoJtT2n2w/view?usp=share_link) |  67.64| 70.65 | 83.73 | 72.89| 74.51 | 88.60 | 59.33| 64.64 | 76.38 | 67.76| 69.36  | 
-| [PolyFormer-L](https://drive.google.com/file/d/1lUCv7dUPctEz4vEpPr7aI8A8ZmfYCB8y/view?usp=share_link)  |  69.33| 72.15 | 84.98 | 74.56| 75.71 | 89.77 | 61.87| 66.73 | 77.97 | 69.20| 71.15 | 
 
+|            |  Refcocog val| || Refcocog test| || 
+|------------|------|------|------|------|------|------|
+|    Model   |  oIoU | mIoU   |[email protected] | oIoU | mIoU   |[email protected] |
+|PolyFormerB [Checkpoint](https://drive.google.com/file/d/12_ylFhsbqGySxDqgeEByn8nKoJtT2n2w/view?usp=share_link)  |  67.76| 69.36  | 84.46| 69.05| 69.88 | 84.96 | 
+|PolyFormerL [Checkpoint](https://drive.google.com/file/d/1lUCv7dUPctEz4vEpPr7aI8A8ZmfYCB8y/view?usp=share_link)  |  69.20| 71.15 | 85.83 | 70.19| 71.17  | 85.91| 
 
-|                                                                                                       |  Refcocog val| || | Refcocog test| | 
-|-------------------------------------------------------------------------------------------------------|------|------|------|------|------|------|
-| Model                                                                                                 |  oIoU | mIoU   |[email protected] | oIoU | mIoU   |[email protected] |
-| [PolyFormer-B](https://drive.google.com/file/d/1am7SKADCJgdOoXcd6z5JNEB3dHlabraA/view?usp=share_link) |  67.76| 69.36  | 84.46| 69.05| 69.88 | 84.96 | 
-| [PolyFormer-L](https://drive.google.com/file/d/1upjK4YmtQT9b6qcA3yj3DXKnOuI52Pxv/view?usp=share_link) |  69.20| 71.15 | 85.83 | 70.19| 71.17  | 85.91| 
+* PolyFormerB Pretrain [Checkpoint](https://drive.google.com/file/d/1sAzfChYDdHdaeatB2K14lrJjG4uiXAol/view?usp=share_link)
+* PolyFormerL Pretrain [Checkpoint](https://drive.google.com/file/d/1knRxgM1lmEkuZZ-cOm_fmwKP1H0bJGU9/view?usp=share_link)
+
+
+## Run the demo
+You can run the demo locally by:
+```bash
+python app.py
+```
 
-* Pretrained weights:
-  * [PolyFormer-B](https://drive.google.com/file/d/1sAzfChYDdHdaeatB2K14lrJjG4uiXAol/view?usp=share_link)
-  * [PolyFormer-L](https://drive.google.com/file/d/1knRxgM1lmEkuZZ-cOm_fmwKP1H0bJGU9/view?usp=share_link)
 
 # Acknowlegement
 This codebase is developed based on [OFA](https://github.com/OFA-Sys/OFA). 

diff --git a/app.py b/app.py
@@ -0,0 +1,47 @@
+# https://huggingface.co/koajoel/PolyFormer
+import os
+import torch
+import numpy as np
+from fairseq import utils,tasks
+from utils.checkpoint_utils import load_model_ensemble_and_task
+from models.polyformer import PolyFormerModel
+import cv2
+
+import torch
+import numpy as np
+from fairseq import utils, tasks
+from fairseq import checkpoint_utils
+from utils.eval_utils import eval_step
+from tasks.refcoco import RefcocoTask
+from models.polyformer import PolyFormerModel
+from PIL import Image
+from torchvision import transforms
+import cv2
+import gradio as gr
+import math
+from io import BytesIO
+import base64
+import re
+from demo import visual_grounding
+
+title = "PolyFormer for Visual Grounding"
+
+description = """<p style='text-align: center'> <a href='https://polyformer.github.io/' target='_blank'>Project Page</a> | <a href='https://arxiv.org/pdf/2302.07387.pdf' target='_blank'>Paper</a> | <a href='https://github.com/amazon-science/polygon-transformer' target='_blank'>Github Repo</a></p>
+                 <p style='text-align: left'> Demo of PolyFormer for referring image segmentation and referring expression comprehension. Upload your own image or click any one of the examples, and write a description about a certain object. Then click \"Submit\" and wait for the results.</p>
+"""
+
+examples = [['demo/vases.jpg', 'the blue vase on the left'],
+            ['demo/dog.jpg', 'the dog wearing glasses'],
+            ['demo/bear.jpeg', 'a bear astronaut in the space'],
+            ['demo/unicorn.jpeg', 'a unicorn doing computer vision research'],
+            ['demo/pig.jpeg', 'a pig robot preparing a delicious meal'],
+            ['demo/otta.png', 'a gentleman otter in a 19th century portrait'],
+            ['demo/pikachu.jpeg', 'a pikachu fine-dining  with  a view  to  the  Eiffel Tower'],
+            ['demo/cabin.jpeg', 'a small cabin on top of a snowy mountain in the style of Disney art station']
+            ]
+io = gr.Interface(fn=visual_grounding, inputs=[gr.inputs.Image(type='pil'), "textbox"],
+                  outputs=[gr.outputs.Image(label="output", type='numpy'), gr.outputs.Image(label="predicted mask", type='numpy')],
+                  title=title, description=description, examples=examples,
+                  allow_flagging=False, allow_screenshot=False, cache_examples=False)
+io.launch(share=True)
+