From a58ec4f03ed9626d2151f7afad07d696d33f33f1 Mon Sep 17 00:00:00 2001
From: Ruotian Luo <rluo@ttic.edu>
Date: Tue, 31 Dec 2019 16:50:39 -0600
Subject: [PATCH] Update model zoo, advanced, readme

---
 ADVANCED.md                | 36 ++++++++++++++
 MODEL_ZOO.md               | 97 ++++++++++++++++++++++++++++++++++----
 README.md                  |  8 +++-
 configs/a2i2.yml           | 20 ++++++++
 configs/a2i2_nsc.yml       |  9 ++++
 configs/a2i2_sc.yml        |  4 ++
 configs/fc_nsc.yml         | 19 ++++++++
 configs/topdown.yml        |  2 +-
 configs/topdown_nsc.yml    |  1 +
 configs/topdown_sc.yml     |  4 +-
 configs/transformer.yml    |  5 ++
 configs/transformer_sc.yml |  5 +-
 data/README.md             |  1 +
 13 files changed, 197 insertions(+), 14 deletions(-)
 create mode 100644 configs/a2i2.yml
 create mode 100644 configs/a2i2_nsc.yml
 create mode 100644 configs/a2i2_sc.yml
 create mode 100644 configs/fc_nsc.yml

diff --git a/ADVANCED.md b/ADVANCED.md
index 0e3830dc..6e717f30 100644
--- a/ADVANCED.md
+++ b/ADVANCED.md
@@ -8,6 +8,42 @@ Current ensemble only supports models which are subclass of AttModel. Here is ex
 python eval_ensemble.py --dump_json 0 --ids model1,model2,model3 --weights 0.3,0.3,0.3 --batch_size 1 --dump_images 0 --num_images 5000 --split test --language_eval 1 --beam_size 5 --temperature 1.0 --sample_method greedy --max_length 30
 ```
 
+## BPE
+
+```
+python scripts/prepro_labels.py --input_json data/dataset_coco.json --output_json data/cocotalk_bpe.json --output_h5 data/cocotalk_bpe --symbol_count 6000
+```
+
+Doesn't seem to help improve performance.
+
+## Use lmdb instead of a folder of countless files
+
+It's known that some file systems do not like a folder with a lot of single files. However, in this project, the default way of saving precomputed image features is to save each image feature as an individual file.
+
+Usually, for COCO, once all the features have been cached on the memory (basically after the first epoch), then the time for reading data is negligible. However, for much larger dataset like ConceptualCaptioning, since the number of features is too large and the memory cannot fit all image features, this results in extremely slow data loading and is always slow even passing one epoch.
+
+For that dataset, I used lmdb to save all the features. Although it is still slow to load the data, it's much better compared to saving individual files.
+
+To generate lmdb file from a folder of features, check out `scripts/dump_to_lmdb.py` which is borrowed from [Lyken17/Efficient-PyTorch](https://github.com/Lyken17/Efficient-PyTorch/tools).
+
+I believe the current way of using lmdb in `dataloader.py` is far from optimal. I tried methods in tensorpack but failed to make it work. (The idea was to ready by chunk, so that the lmdb loading can load a chunk at a time, reducing the time for ad hoc disk visiting.)
+
+## new self critical
+
+This "new self critical" is borrowed from "Variational inference for monte carlo objectives". The only difference from the original self critical, is the definition of baseline.
+
+In the original self critical, the baseline is the score of greedy decoding output. In new self critical, the baseline is the average score of the other samples (this requires the model to generate multiple samples for each image).
+
+To try self critical on topdown model, you can run
+
+`python train.py --cfg configs/topdown_nsc.yml`
+
+This yml file can also provides you some hint what to change to use new self critical.
+
+## Sample n captions
+
+When sampling, set `sample_n` to be greater than 0. 
+
 ## Batch normalization
 
 ## Box feature
\ No newline at end of file
diff --git a/MODEL_ZOO.md b/MODEL_ZOO.md
index 51b58e22..18fb0906 100644
--- a/MODEL_ZOO.md
+++ b/MODEL_ZOO.md
@@ -1,10 +1,10 @@
-# Models trained with Resnet101 feature:
+# Models
 
-Models are provided in [link](https://drive.google.com/open?id=0B7fNdx_jAqhtcXp0aFlWSnJmb0k)
+Results are on karpathy test split, beam size 5. Without notice, the numbers shown are not selected. The scores are just used to verify if you are getting things right. If the scores you get is close to the number I give (it could be higher or lower), then it's ok.
 
-# Models trained with Bottomup feature:
+# Trained with Resnet101 feature:
 
-Results are on karpathy test split, beam size 5.
+Collection: [link](https://drive.google.com/open?id=0B7fNdx_jAqhtcXp0aFlWSnJmb0k)
 
 <table><tbody>
 <!-- START TABLE -->
@@ -12,12 +12,93 @@ Results are on karpathy test split, beam size 5.
 <th valign="bottom">Name</th>
 <th valign="bottom">CIDEr</th>
 <th valign="bottom">SPICE</th>
-<th valign="bottom">download</th>
-<th valign="bottom">comment</th>
+<th valign="bottom">Download</th>
+<th valign="bottom">Note</th>
 <!-- TABLE BODY -->
-<!-- ROW: faster_rcnn_R_50_C4_1x -->
+ <tr><td align="left"><a href="configs/fc.yml">FC</a></td>
+<td align="center">0.953</td>
+<td align="center">0.1787</td>
+<td align="center"><a href="https://drive.google.com/open?id=1AG8Tulna7gan6OgmYul0QhxONDBGcdun">model&metrics</a></td>
+<td align="center">--caption_model newfc</td>
+</tr>
+ <tr><td align="left"><a href="configs/fc_rl.yml">FC<br>+self_critical</a></td>
+<td align="center">1.045</td>
+<td align="center">0.1838</td>
+<td align="center"><a href="https://drive.google.com/open?id=1MA-9ByDNPXis2jKG0K0Z-cF_yZz7znBc">model&metrics</a></td>
+<td align="center">--caption_model newfc</td>
+</tr>
+ <tr><td align="left"><a href="configs/fc_nsc.yml">FC<br>+new_self_critical</a></td>
+<td align="center">1.066</td>
+<td align="center">0.1856</td>
+<td align="center"><a href="https://drive.google.com/open?id=1OsB_jLDorJnzKz6xsOfk1n493P3hwOP0">model&metrics</a></td>
+<td align="center">--caption_model newfc</td>
+</tr>
+</tbody></table>
+
+# Trained with Bottomup feature:
+
+Collection: [link](https://drive.google.com/open?id=1-RNak8qLUR5LqfItY6OenbRl8sdwODng)
+
+<table><tbody>
+<!-- START TABLE -->
+<!-- TABLE HEADER -->
+<th valign="bottom">Name</th>
+<th valign="bottom">CIDEr</th>
+<th valign="bottom">SPICE</th>
+<th valign="bottom">Download</th>
+<th valign="bottom">Note</th>
+<!-- TABLE BODY -->
+ <tr><td align="left"><a href="configs/a2i2.yml">Att2in</a></td>
+<td align="center">1.089</td>
+<td align="center">0.1982</td>
+<td align="center"><a href="https://drive.google.com/open?id=1jO9bSocC93n1vBZmZVaASWc_jJ1VKZUq">model&metrics</a></td>
+<td align="center">My replication</td>
+</tr>
+ <tr><td align="left"><a href="configs/a2i2_sc.yml">Att2in<br>+self_critical</a></td>
+<td align="center">1.173</td>
+<td align="center">0.2046</td>
+<td align="center"><a href="https://drive.google.com/open?id=1aI7hYUmgRLksI1wvN9-895GMHz4yStHz">model&metrics</a></td>
+<td align="center"></td>
+</tr>
+ <tr><td align="left"><a href="configs/a2i2_nsc.yml">Att2in<br>+new_self_critical</a></td>
+<td align="center">1.219</td>
+<td align="center">0.2099</td>
+<td align="center"><a href="https://drive.google.com/open?id=1BkxLPL4SuQ_qFa-4fN96u23iTFWw-iXX">model&metrics</a></td>
+<td align="center"></td>
+</tr>
+ <tr><td align="left"><a href="configs/transformer.yml">Transformer</a></td>
+<td align="center">1.113</td>
+<td align="center">0.2045</td>
+<td align="center"><a href="https://drive.google.com/open?id=10Q5GJ2jZFCexD71rY9gg886Aasuaup8O">model&metrics</a></td>
+<td align="center"></td>
+</tr>
+ <tr><td align="left"><a href="configs/transformer_sc.yml">Transformer<br>+self_critical</a></td>
+<td align="center">1.266</td>
+<td align="center">0.2224</td>
+<td align="center"><a href="https://drive.google.com/open?id=12iKJJSIGrzFth_dJXqcXy-_IjAU0I3DC">model&metrics</a></td>
+<td align="center"></td>
+</tr>
+
+ <tr><td align="left"><a href="configs/topdown.yml">topdown</a></td>
+<td align="center">1.099</td>
+<td align="center">0.1999</td>
+<td align="center"><a href="https://drive.google.com/open?id=14w8YXrjxSAi5D4Adx8jgfg4geQ8XS8wH">model&metrics</a></td>
+<td align="center">My replication</td>
+</tr>
+ <tr><td align="left"><a href="configs/topdown_sc.yml">topdown<br>+self_critical</a></td>
+<td align="center">1.227</td>
+<td align="center">0.2145</td>
+<td align="center"><a href="https://drive.google.com/open?id=1QdCigVWdDKTbUe3_HQFEGkAsv9XIkKkE">model&metrics</a></td>
+<td align="center"></td>
+</tr>
+ <tr><td align="left"><a href="configs/topdown_nsc.yml">topdown<br>+new_self_critical</a></td>
+<td align="center">1.239</td>
+<td align="center">0.2154</td>
+<td align="center"><a href="https://drive.google.com/open?id=1cgoywxAdzHtIF2C6zNnIA7G2wjol_ybf">model&metrics</a></td>
+<td align="center"></td>
+</tr>
  <tr><td align="left"><a href="configs/td_long_nsc.yml">Topdown<br>+Schedule long<br>+new_self_critical</a></td>
-<td align="center">1.280</td>
+<td align="center"><b>1.280</b></td>
 <td align="center">0.2200</td>
 <td align="center"><a href="https://drive.google.com/open?id=1bCDmf4JCM79f5Lqp6MAn1ap4b3NJ5Gis">model&metrics</a></td>
 <td align="center">Best of 5 models<br>schedule proposed by yangxuntu</td>
diff --git a/README.md b/README.md
index 69056cf4..4cad8b32 100644
--- a/README.md
+++ b/README.md
@@ -13,9 +13,13 @@ This is based on my [ImageCaptioning.pytorch](https://github.com/ruotianluo/Imag
 
 ## Requirements
 Python 2.7 (because there is no [coco-caption](https://github.com/tylin/coco-caption) version for python 3)
+
 PyTorch 1.3 (along with torchvision)
+
 cider (already been added as a submodule)
-coco-caption (already been added as a submodule)
+
+coco-caption (already been added as a submodule) (**Remember to follow initialization steps in coco-caption/README.md**)
+
 yacs
 
 (**Skip if you are using bottom-up feature**): If you want to use resnet to extract image features, you need to download pretrained resnet model for both training and evaluation. The models can be downloaded from [here](https://drive.google.com/open?id=0B7fNdx_jAqhtbVYzOURMdDNHSGM), and should be placed in `data/imagenet_weights`.
@@ -64,7 +68,7 @@ For more options, see `opts.py`.
 
 First you should preprocess the dataset and get the cache for calculating cider score:
 ```
-$ python scripts/prepro_ngrams.py --input_json .../dataset_coco.json --dict_json data/cocotalk.json --output_pkl data/coco-train --split train
+$ python scripts/prepro_ngrams.py --input_json data/dataset_coco.json --dict_json data/cocotalk.json --output_pkl data/coco-train --split train
 ```
 
 Then, copy the model from the pretrained model using cross entropy. (It's not mandatory to copy the model, just for back-up)
diff --git a/configs/a2i2.yml b/configs/a2i2.yml
new file mode 100644
index 00000000..808582b4
--- /dev/null
+++ b/configs/a2i2.yml
@@ -0,0 +1,20 @@
+# base
+caption_model: att2in2
+input_json: data/cocotalk.json
+input_fc_dir: data/cocobu_fc
+input_att_dir: data/cocobu_att
+input_label_h5: data/cocotalk_label.h5
+learning_rate: 0.0005
+learning_rate_decay_start: 0
+scheduled_sampling_start: 0
+# checkpoint_path: $ckpt_path
+# $start_from
+language_eval: 1
+save_checkpoint_every: 3000
+val_images_use: 5000
+
+train_sample_n: 5
+self_critical_after: 30
+batch_size: 10
+learning_rate_decay_start: 0
+max_epochs: 30
diff --git a/configs/a2i2_nsc.yml b/configs/a2i2_nsc.yml
new file mode 100644
index 00000000..785e5ff3
--- /dev/null
+++ b/configs/a2i2_nsc.yml
@@ -0,0 +1,9 @@
+_BASE_: a2i2.yml
+learning_rate: 0.00005
+learning_rate_decay_start: -1
+self_critical_after: -1
+structure_after: 30
+structure_sample_n: 5
+structure_loss_weight: 1
+structure_loss_type: new_self_critical
+max_epochs: 50
diff --git a/configs/a2i2_sc.yml b/configs/a2i2_sc.yml
new file mode 100644
index 00000000..a42a8331
--- /dev/null
+++ b/configs/a2i2_sc.yml
@@ -0,0 +1,4 @@
+_BASE_: a2i2.yml
+learning_rate: 0.00005
+learning_rate_decay_start: -1
+s: 50
\ No newline at end of file
diff --git a/configs/fc_nsc.yml b/configs/fc_nsc.yml
new file mode 100644
index 00000000..124f617e
--- /dev/null
+++ b/configs/fc_nsc.yml
@@ -0,0 +1,19 @@
+_BASE_: fc.yml
+learning_rate: 0.00005
+learning_rate_decay_start: -1
+scheduled_sampling_start: -1
+
+language_eval: 1
+save_checkpoint_every: 3000
+val_images_use: 5000
+
+batch_size: 10
+max_epochs: 50
+cached_tokens: coco-train-idxs
+
+
+self_critical_after: -1
+structure_after: 30
+structure_sample_n: 5
+structure_loss_weight: 1
+structure_loss_type: new_self_critical
diff --git a/configs/topdown.yml b/configs/topdown.yml
index 760dcf1a..324892e8 100644
--- a/configs/topdown.yml
+++ b/configs/topdown.yml
@@ -20,4 +20,4 @@ train_sample_n: 5
 self_critical_after: 30
 batch_size: 10
 learning_rate_decay_start: 0
-max_epochs: 50
+max_epochs: 30
diff --git a/configs/topdown_nsc.yml b/configs/topdown_nsc.yml
index 3254c247..4a73a4ea 100644
--- a/configs/topdown_nsc.yml
+++ b/configs/topdown_nsc.yml
@@ -6,3 +6,4 @@ structure_after: 30
 structure_sample_n: 5
 structure_loss_weight: 1
 structure_loss_type: new_self_critical
+max_epoch: 50
diff --git a/configs/topdown_sc.yml b/configs/topdown_sc.yml
index 2d1bee5d..7a018dcf 100644
--- a/configs/topdown_sc.yml
+++ b/configs/topdown_sc.yml
@@ -1,3 +1,5 @@
 _BASE_: topdown.yml
 learning_rate: 0.00005
-learning_rate_decay_start: -1
\ No newline at end of file
+learning_rate_decay_start: -1
+
+max_epoch: 50
\ No newline at end of file
diff --git a/configs/transformer.yml b/configs/transformer.yml
index 6716eac0..a08ef544 100644
--- a/configs/transformer.yml
+++ b/configs/transformer.yml
@@ -10,6 +10,11 @@ seq_per_img: 5
 batch_size: 10
 learning_rate: 0.0005
 
+# Notice: because I'm to lazy, I reuse the option name for RNNs to set the hyperparameters for transformer:
+# N=num_layers
+# d_model=input_encoding_size
+# d_ff=rnn_size
+# h is always 8
 num_layers: 6
 input_encoding_size: 512
 rnn_size: 2048
diff --git a/configs/transformer_sc.yml b/configs/transformer_sc.yml
index 4d066a2d..75a68035 100644
--- a/configs/transformer_sc.yml
+++ b/configs/transformer_sc.yml
@@ -1,6 +1,7 @@
 _BASE_: transformer.yml
 reduce_on_plateau: true
-
+noamopt: false
 learning_rate: 0.00001
 
-self_critical_after: 15
\ No newline at end of file
+self_critical_after: 15
+max_epochs: 50
\ No newline at end of file
diff --git a/data/README.md b/data/README.md
index f0243145..b396437a 100644
--- a/data/README.md
+++ b/data/README.md
@@ -55,6 +55,7 @@ This will create `data/cocobu_fc`, `data/cocobu_att` and `data/cocobu_box`. If y
 #### Download converted files
 
 bottomup-fc: [link](https://drive.google.com/file/d/1IpjCJ5LYC4kX2krxHcPgxAIipgA8uqTU/view?usp=sharing) (The fc features here are simply the average of the attention features)
+
 bottomup-att: [link](https://drive.google.com/file/d/1hun0tsel34aXO4CYyTRIvHJkcbZHwjrD/view?usp=sharing)