Update model zoo, advanced, readme

ruotianluo · Jan 10, 2020 · a58ec4f · a58ec4f
1 parent 13b06ce
commit a58ec4f
Show file tree

Hide file tree

Showing 13 changed files with 197 additions and 14 deletions.
diff --git a/ADVANCED.md b/ADVANCED.md
@@ -8,6 +8,42 @@ Current ensemble only supports models which are subclass of AttModel. Here is ex
 python eval_ensemble.py --dump_json 0 --ids model1,model2,model3 --weights 0.3,0.3,0.3 --batch_size 1 --dump_images 0 --num_images 5000 --split test --language_eval 1 --beam_size 5 --temperature 1.0 --sample_method greedy --max_length 30
 ```
 
+## BPE
+
+```
+python scripts/prepro_labels.py --input_json data/dataset_coco.json --output_json data/cocotalk_bpe.json --output_h5 data/cocotalk_bpe --symbol_count 6000
+```
+
+Doesn't seem to help improve performance.
+
+## Use lmdb instead of a folder of countless files
+
+It's known that some file systems do not like a folder with a lot of single files. However, in this project, the default way of saving precomputed image features is to save each image feature as an individual file.
+
+Usually, for COCO, once all the features have been cached on the memory (basically after the first epoch), then the time for reading data is negligible. However, for much larger dataset like ConceptualCaptioning, since the number of features is too large and the memory cannot fit all image features, this results in extremely slow data loading and is always slow even passing one epoch.
+
+For that dataset, I used lmdb to save all the features. Although it is still slow to load the data, it's much better compared to saving individual files.
+
+To generate lmdb file from a folder of features, check out `scripts/dump_to_lmdb.py` which is borrowed from [Lyken17/Efficient-PyTorch](https://github.com/Lyken17/Efficient-PyTorch/tools).
+
+I believe the current way of using lmdb in `dataloader.py` is far from optimal. I tried methods in tensorpack but failed to make it work. (The idea was to ready by chunk, so that the lmdb loading can load a chunk at a time, reducing the time for ad hoc disk visiting.)
+
+## new self critical
+
+This "new self critical" is borrowed from "Variational inference for monte carlo objectives". The only difference from the original self critical, is the definition of baseline.
+
+In the original self critical, the baseline is the score of greedy decoding output. In new self critical, the baseline is the average score of the other samples (this requires the model to generate multiple samples for each image).
+
+To try self critical on topdown model, you can run
+
+`python train.py --cfg configs/topdown_nsc.yml`
+
+This yml file can also provides you some hint what to change to use new self critical.
+
+## Sample n captions
+
+When sampling, set `sample_n` to be greater than 0. 
+
 ## Batch normalization
 
 ## Box feature
diff --git a/MODEL_ZOO.md b/MODEL_ZOO.md
@@ -1,23 +1,104 @@
-# Models trained with Resnet101 feature:
+# Models
 
-Models are provided in [link](https://drive.google.com/open?id=0B7fNdx_jAqhtcXp0aFlWSnJmb0k)
+Results are on karpathy test split, beam size 5. Without notice, the numbers shown are not selected. The scores are just used to verify if you are getting things right. If the scores you get is close to the number I give (it could be higher or lower), then it's ok.
 
-# Models trained with Bottomup feature:
+# Trained with Resnet101 feature:
 
-Results are on karpathy test split, beam size 5.
+Collection: [link](https://drive.google.com/open?id=0B7fNdx_jAqhtcXp0aFlWSnJmb0k)
 
 <table><tbody>
 <!-- START TABLE -->
 <!-- TABLE HEADER -->
 <th valign="bottom">Name</th>
 <th valign="bottom">CIDEr</th>
 <th valign="bottom">SPICE</th>
-<th valign="bottom">download</th>
-<th valign="bottom">comment</th>
+<th valign="bottom">Download</th>
+<th valign="bottom">Note</th>
 <!-- TABLE BODY -->
-<!-- ROW: faster_rcnn_R_50_C4_1x -->
+ <tr><td align="left"><a href="configs/fc.yml">FC</a></td>
+<td align="center">0.953</td>
+<td align="center">0.1787</td>
+<td align="center"><a href="https://drive.google.com/open?id=1AG8Tulna7gan6OgmYul0QhxONDBGcdun">model&metrics</a></td>
+<td align="center">--caption_model newfc</td>
+</tr>
+ <tr><td align="left"><a href="configs/fc_rl.yml">FC<br>+self_critical</a></td>
+<td align="center">1.045</td>
+<td align="center">0.1838</td>
+<td align="center"><a href="https://drive.google.com/open?id=1MA-9ByDNPXis2jKG0K0Z-cF_yZz7znBc">model&metrics</a></td>
+<td align="center">--caption_model newfc</td>
+</tr>
+ <tr><td align="left"><a href="configs/fc_nsc.yml">FC<br>+new_self_critical</a></td>
+<td align="center">1.066</td>
+<td align="center">0.1856</td>
+<td align="center"><a href="https://drive.google.com/open?id=1OsB_jLDorJnzKz6xsOfk1n493P3hwOP0">model&metrics</a></td>
+<td align="center">--caption_model newfc</td>
+</tr>
+</tbody></table>
+
+# Trained with Bottomup feature:
+
+Collection: [link](https://drive.google.com/open?id=1-RNak8qLUR5LqfItY6OenbRl8sdwODng)
+
+<table><tbody>
+<!-- START TABLE -->
+<!-- TABLE HEADER -->
+<th valign="bottom">Name</th>
+<th valign="bottom">CIDEr</th>
+<th valign="bottom">SPICE</th>
+<th valign="bottom">Download</th>
+<th valign="bottom">Note</th>
+<!-- TABLE BODY -->
+ <tr><td align="left"><a href="configs/a2i2.yml">Att2in</a></td>
+<td align="center">1.089</td>
+<td align="center">0.1982</td>
+<td align="center"><a href="https://drive.google.com/open?id=1jO9bSocC93n1vBZmZVaASWc_jJ1VKZUq">model&metrics</a></td>
+<td align="center">My replication</td>
+</tr>
+ <tr><td align="left"><a href="configs/a2i2_sc.yml">Att2in<br>+self_critical</a></td>
+<td align="center">1.173</td>
+<td align="center">0.2046</td>
+<td align="center"><a href="https://drive.google.com/open?id=1aI7hYUmgRLksI1wvN9-895GMHz4yStHz">model&metrics</a></td>
+<td align="center"></td>
+</tr>
+ <tr><td align="left"><a href="configs/a2i2_nsc.yml">Att2in<br>+new_self_critical</a></td>
+<td align="center">1.219</td>
+<td align="center">0.2099</td>
+<td align="center"><a href="https://drive.google.com/open?id=1BkxLPL4SuQ_qFa-4fN96u23iTFWw-iXX">model&metrics</a></td>
+<td align="center"></td>
+</tr>
+ <tr><td align="left"><a href="configs/transformer.yml">Transformer</a></td>
+<td align="center">1.113</td>
+<td align="center">0.2045</td>
+<td align="center"><a href="https://drive.google.com/open?id=10Q5GJ2jZFCexD71rY9gg886Aasuaup8O">model&metrics</a></td>
+<td align="center"></td>
+</tr>
+ <tr><td align="left"><a href="configs/transformer_sc.yml">Transformer<br>+self_critical</a></td>
+<td align="center">1.266</td>
+<td align="center">0.2224</td>
+<td align="center"><a href="https://drive.google.com/open?id=12iKJJSIGrzFth_dJXqcXy-_IjAU0I3DC">model&metrics</a></td>
+<td align="center"></td>
+</tr>
+
+ <tr><td align="left"><a href="configs/topdown.yml">topdown</a></td>
+<td align="center">1.099</td>
+<td align="center">0.1999</td>
+<td align="center"><a href="https://drive.google.com/open?id=14w8YXrjxSAi5D4Adx8jgfg4geQ8XS8wH">model&metrics</a></td>
+<td align="center">My replication</td>
+</tr>
+ <tr><td align="left"><a href="configs/topdown_sc.yml">topdown<br>+self_critical</a></td>
+<td align="center">1.227</td>
+<td align="center">0.2145</td>
+<td align="center"><a href="https://drive.google.com/open?id=1QdCigVWdDKTbUe3_HQFEGkAsv9XIkKkE">model&metrics</a></td>
+<td align="center"></td>
+</tr>
+ <tr><td align="left"><a href="configs/topdown_nsc.yml">topdown<br>+new_self_critical</a></td>
+<td align="center">1.239</td>
+<td align="center">0.2154</td>
+<td align="center"><a href="https://drive.google.com/open?id=1cgoywxAdzHtIF2C6zNnIA7G2wjol_ybf">model&metrics</a></td>
+<td align="center"></td>
+</tr>
  <tr><td align="left"><a href="configs/td_long_nsc.yml">Topdown<br>+Schedule long<br>+new_self_critical</a></td>
-<td align="center">1.280</td>
+<td align="center"><b>1.280</b></td>
 <td align="center">0.2200</td>
 <td align="center"><a href="https://drive.google.com/open?id=1bCDmf4JCM79f5Lqp6MAn1ap4b3NJ5Gis">model&metrics</a></td>
 <td align="center">Best of 5 models<br>schedule proposed by yangxuntu</td>

diff --git a/README.md b/README.md
@@ -13,9 +13,13 @@ This is based on my [ImageCaptioning.pytorch](https://github.com/ruotianluo/Imag
 
 ## Requirements
 Python 2.7 (because there is no [coco-caption](https://github.com/tylin/coco-caption) version for python 3)
+
 PyTorch 1.3 (along with torchvision)
+
 cider (already been added as a submodule)
-coco-caption (already been added as a submodule)
+
+coco-caption (already been added as a submodule) (**Remember to follow initialization steps in coco-caption/README.md**)
+
 yacs
 
 (**Skip if you are using bottom-up feature**): If you want to use resnet to extract image features, you need to download pretrained resnet model for both training and evaluation. The models can be downloaded from [here](https://drive.google.com/open?id=0B7fNdx_jAqhtbVYzOURMdDNHSGM), and should be placed in `data/imagenet_weights`.
@@ -64,7 +68,7 @@ For more options, see `opts.py`.
 
 First you should preprocess the dataset and get the cache for calculating cider score:
 ```
-$ python scripts/prepro_ngrams.py --input_json .../dataset_coco.json --dict_json data/cocotalk.json --output_pkl data/coco-train --split train
+$ python scripts/prepro_ngrams.py --input_json data/dataset_coco.json --dict_json data/cocotalk.json --output_pkl data/coco-train --split train
 ```
 
 Then, copy the model from the pretrained model using cross entropy. (It's not mandatory to copy the model, just for back-up)

diff --git a/configs/a2i2.yml b/configs/a2i2.yml
@@ -0,0 +1,20 @@
+# base
+caption_model: att2in2
+input_json: data/cocotalk.json
+input_fc_dir: data/cocobu_fc
+input_att_dir: data/cocobu_att
+input_label_h5: data/cocotalk_label.h5
+learning_rate: 0.0005
+learning_rate_decay_start: 0
+scheduled_sampling_start: 0
+# checkpoint_path: $ckpt_path
+# $start_from
+language_eval: 1
+save_checkpoint_every: 3000
+val_images_use: 5000
+
+train_sample_n: 5
+self_critical_after: 30
+batch_size: 10
+learning_rate_decay_start: 0
+max_epochs: 30
diff --git a/configs/a2i2_nsc.yml b/configs/a2i2_nsc.yml
@@ -0,0 +1,9 @@
+_BASE_: a2i2.yml
+learning_rate: 0.00005
+learning_rate_decay_start: -1
+self_critical_after: -1
+structure_after: 30
+structure_sample_n: 5
+structure_loss_weight: 1
+structure_loss_type: new_self_critical
+max_epochs: 50
diff --git a/configs/a2i2_sc.yml b/configs/a2i2_sc.yml
@@ -0,0 +1,4 @@
+_BASE_: a2i2.yml
+learning_rate: 0.00005
+learning_rate_decay_start: -1
+s: 50
diff --git a/configs/fc_nsc.yml b/configs/fc_nsc.yml
@@ -0,0 +1,19 @@
+_BASE_: fc.yml
+learning_rate: 0.00005
+learning_rate_decay_start: -1
+scheduled_sampling_start: -1
+
+language_eval: 1
+save_checkpoint_every: 3000
+val_images_use: 5000
+
+batch_size: 10
+max_epochs: 50
+cached_tokens: coco-train-idxs
+
+
+self_critical_after: -1
+structure_after: 30
+structure_sample_n: 5
+structure_loss_weight: 1
+structure_loss_type: new_self_critical
diff --git a/configs/topdown.yml b/configs/topdown.yml
@@ -20,4 +20,4 @@ train_sample_n: 5
 self_critical_after: 30
 batch_size: 10
 learning_rate_decay_start: 0
-max_epochs: 50
+max_epochs: 30
diff --git a/configs/topdown_nsc.yml b/configs/topdown_nsc.yml
@@ -6,3 +6,4 @@ structure_after: 30
 structure_sample_n: 5
 structure_loss_weight: 1
 structure_loss_type: new_self_critical
+max_epoch: 50
diff --git a/configs/topdown_sc.yml b/configs/topdown_sc.yml
@@ -1,3 +1,5 @@
 _BASE_: topdown.yml
 learning_rate: 0.00005
-learning_rate_decay_start: -1
+learning_rate_decay_start: -1
+
+max_epoch: 50
diff --git a/configs/transformer.yml b/configs/transformer.yml
@@ -10,6 +10,11 @@ seq_per_img: 5
 batch_size: 10
 learning_rate: 0.0005
 
+# Notice: because I'm to lazy, I reuse the option name for RNNs to set the hyperparameters for transformer:
+# N=num_layers
+# d_model=input_encoding_size
+# d_ff=rnn_size
+# h is always 8
 num_layers: 6
 input_encoding_size: 512
 rnn_size: 2048

diff --git a/configs/transformer_sc.yml b/configs/transformer_sc.yml
@@ -1,6 +1,7 @@
 _BASE_: transformer.yml
 reduce_on_plateau: true
-
+noamopt: false
 learning_rate: 0.00001
 
-self_critical_after: 15
+self_critical_after: 15
+max_epochs: 50
diff --git a/data/README.md b/data/README.md
@@ -55,6 +55,7 @@ This will create `data/cocobu_fc`, `data/cocobu_att` and `data/cocobu_box`. If y
 #### Download converted files
 
 bottomup-fc: [link](https://drive.google.com/file/d/1IpjCJ5LYC4kX2krxHcPgxAIipgA8uqTU/view?usp=sharing) (The fc features here are simply the average of the attention features)
+
 bottomup-att: [link](https://drive.google.com/file/d/1hun0tsel34aXO4CYyTRIvHJkcbZHwjrD/view?usp=sharing)
Original file line number	Diff line number	Diff line change
Expand Up		@@ -55,6 +55,7 @@ This will create `data/cocobu_fc`, `data/cocobu_att` and `data/cocobu_box`. If y
		#### Download converted files

		bottomup-fc: [link](https://drive.google.com/file/d/1IpjCJ5LYC4kX2krxHcPgxAIipgA8uqTU/view?usp=sharing) (The fc features here are simply the average of the attention features)

		bottomup-att: [link](https://drive.google.com/file/d/1hun0tsel34aXO4CYyTRIvHJkcbZHwjrD/view?usp=sharing)


Expand Down