From 8cd00d08453b6b9b41fd82157b11236c17f44fe3 Mon Sep 17 00:00:00 2001
From: Junnan Li <junnan.li@salesforce.com>
Date: Fri, 16 Jul 2021 08:52:56 +0800
Subject: [PATCH] Create README.md

---
 README.md | 38 ++++++++++++++++++++++++++++++++++++++
 1 file changed, 38 insertions(+)
 create mode 100644 README.md
diff --git a/README.md b/README.md
new file mode 100644
index 0000000..6a357cd
--- /dev/null
+++ b/README.md
@@ -0,0 +1,38 @@
+## Align before Fuse: Vision and Language Representation Learning with Momentum Distillation (Salesforce Research)
+
+This is the official PyTorch implementation of the <a href="">ALBEF paper</a> <a href="">[Blog]</a>. 
+This repository supports finetuning ALBEF on VQA, SNLI-VE, NLVR2, Image-Text Retrieval on MSCOCO and Flickr30k,
+and visual grounding on RefCOCO+. Pre-trained and Fine-tuned checkpoints are released.
+<img src="img.png" width="600">
+
+
+### Requirements:
+* pytorch 1.8.0
+* transformers 4.8.1
+
+### Download:
+
+* <a href="https://storage.googleapis.com/sfr-pcl-data-research/ALBEF/ALBEF.pth"> Pre-trained checkpoint </a>
+* <a href="https://storage.googleapis.com/sfr-pcl-data-research/ALBEF/data.tar.gz"> Dataset json files </a>
+* <a href="https://storage.googleapis.com/sfr-pcl-data-research/ALBEF/mscoco.pth"> Finetuned checkpoint for retrieval on MSCOCO </a>
+* <a href="https://storage.googleapis.com/sfr-pcl-data-research/ALBEF/vqa.pth"> Finetuned checkpoint for VQA </a>
+* <a href="https://storage.googleapis.com/sfr-pcl-data-research/ALBEF/refcoco.pth"> Finetuned checkpoint for visual grounding on RefCOCO+ </a>
+
+### Visualization:
+We provide code in visualize.ipynb to visualize the important areas in an image for each word in a text. 
+Here is an example visualization using the visual grounding checkpoint.
+
+<img src="examples/visualization.png" width="700">
+
+
+
+### Image-Text Retrieval:
+
+1. Download MSCOCO or Flickr30k datasets from original websites.
+2. Download and extract the provided dataset json files.
+3. In configs/Retrieval_coco.yaml or configs/Retrieval_flickr.yaml, set the paths for the json files and the image path.
+4. Finetune the pre-trained checkpoint using 8 A100 GPUs:
+<pre>python -m torch.distributed.launch --nproc_per_node=8 --use_env Retrieval.py \
+--config ./configs/Retrieval_flickr.yaml \
+--output_dir output/Retrieval_flickr \
+--checkpoint [Pretrained checkpoint]</pre>