Skip to content

Commit

Permalink
Update model zoo, advanced, readme
Browse files Browse the repository at this point in the history
  • Loading branch information
ruotianluo committed Jan 10, 2020
1 parent 13b06ce commit a58ec4f
Show file tree
Hide file tree
Showing 13 changed files with 197 additions and 14 deletions.
36 changes: 36 additions & 0 deletions ADVANCED.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,42 @@ Current ensemble only supports models which are subclass of AttModel. Here is ex
python eval_ensemble.py --dump_json 0 --ids model1,model2,model3 --weights 0.3,0.3,0.3 --batch_size 1 --dump_images 0 --num_images 5000 --split test --language_eval 1 --beam_size 5 --temperature 1.0 --sample_method greedy --max_length 30
```

## BPE

```
python scripts/prepro_labels.py --input_json data/dataset_coco.json --output_json data/cocotalk_bpe.json --output_h5 data/cocotalk_bpe --symbol_count 6000
```

Doesn't seem to help improve performance.

## Use lmdb instead of a folder of countless files

It's known that some file systems do not like a folder with a lot of single files. However, in this project, the default way of saving precomputed image features is to save each image feature as an individual file.

Usually, for COCO, once all the features have been cached on the memory (basically after the first epoch), then the time for reading data is negligible. However, for much larger dataset like ConceptualCaptioning, since the number of features is too large and the memory cannot fit all image features, this results in extremely slow data loading and is always slow even passing one epoch.

For that dataset, I used lmdb to save all the features. Although it is still slow to load the data, it's much better compared to saving individual files.

To generate lmdb file from a folder of features, check out `scripts/dump_to_lmdb.py` which is borrowed from [Lyken17/Efficient-PyTorch](https://github.com/Lyken17/Efficient-PyTorch/tools).

I believe the current way of using lmdb in `dataloader.py` is far from optimal. I tried methods in tensorpack but failed to make it work. (The idea was to ready by chunk, so that the lmdb loading can load a chunk at a time, reducing the time for ad hoc disk visiting.)

## new self critical

This "new self critical" is borrowed from "Variational inference for monte carlo objectives". The only difference from the original self critical, is the definition of baseline.

In the original self critical, the baseline is the score of greedy decoding output. In new self critical, the baseline is the average score of the other samples (this requires the model to generate multiple samples for each image).

To try self critical on topdown model, you can run

`python train.py --cfg configs/topdown_nsc.yml`

This yml file can also provides you some hint what to change to use new self critical.

## Sample n captions

When sampling, set `sample_n` to be greater than 0.

## Batch normalization

## Box feature
97 changes: 89 additions & 8 deletions MODEL_ZOO.md
Original file line number Diff line number Diff line change
@@ -1,23 +1,104 @@
# Models trained with Resnet101 feature:
# Models

Models are provided in [link](https://drive.google.com/open?id=0B7fNdx_jAqhtcXp0aFlWSnJmb0k)
Results are on karpathy test split, beam size 5. Without notice, the numbers shown are not selected. The scores are just used to verify if you are getting things right. If the scores you get is close to the number I give (it could be higher or lower), then it's ok.

# Models trained with Bottomup feature:
# Trained with Resnet101 feature:

Results are on karpathy test split, beam size 5.
Collection: [link](https://drive.google.com/open?id=0B7fNdx_jAqhtcXp0aFlWSnJmb0k)

<table><tbody>
<!-- START TABLE -->
<!-- TABLE HEADER -->
<th valign="bottom">Name</th>
<th valign="bottom">CIDEr</th>
<th valign="bottom">SPICE</th>
<th valign="bottom">download</th>
<th valign="bottom">comment</th>
<th valign="bottom">Download</th>
<th valign="bottom">Note</th>
<!-- TABLE BODY -->
<!-- ROW: faster_rcnn_R_50_C4_1x -->
<tr><td align="left"><a href="configs/fc.yml">FC</a></td>
<td align="center">0.953</td>
<td align="center">0.1787</td>
<td align="center"><a href="https://drive.google.com/open?id=1AG8Tulna7gan6OgmYul0QhxONDBGcdun">model&metrics</a></td>
<td align="center">--caption_model newfc</td>
</tr>
<tr><td align="left"><a href="configs/fc_rl.yml">FC<br>+self_critical</a></td>
<td align="center">1.045</td>
<td align="center">0.1838</td>
<td align="center"><a href="https://drive.google.com/open?id=1MA-9ByDNPXis2jKG0K0Z-cF_yZz7znBc">model&metrics</a></td>
<td align="center">--caption_model newfc</td>
</tr>
<tr><td align="left"><a href="configs/fc_nsc.yml">FC<br>+new_self_critical</a></td>
<td align="center">1.066</td>
<td align="center">0.1856</td>
<td align="center"><a href="https://drive.google.com/open?id=1OsB_jLDorJnzKz6xsOfk1n493P3hwOP0">model&metrics</a></td>
<td align="center">--caption_model newfc</td>
</tr>
</tbody></table>

# Trained with Bottomup feature:

Collection: [link](https://drive.google.com/open?id=1-RNak8qLUR5LqfItY6OenbRl8sdwODng)

<table><tbody>
<!-- START TABLE -->
<!-- TABLE HEADER -->
<th valign="bottom">Name</th>
<th valign="bottom">CIDEr</th>
<th valign="bottom">SPICE</th>
<th valign="bottom">Download</th>
<th valign="bottom">Note</th>
<!-- TABLE BODY -->
<tr><td align="left"><a href="configs/a2i2.yml">Att2in</a></td>
<td align="center">1.089</td>
<td align="center">0.1982</td>
<td align="center"><a href="https://drive.google.com/open?id=1jO9bSocC93n1vBZmZVaASWc_jJ1VKZUq">model&metrics</a></td>
<td align="center">My replication</td>
</tr>
<tr><td align="left"><a href="configs/a2i2_sc.yml">Att2in<br>+self_critical</a></td>
<td align="center">1.173</td>
<td align="center">0.2046</td>
<td align="center"><a href="https://drive.google.com/open?id=1aI7hYUmgRLksI1wvN9-895GMHz4yStHz">model&metrics</a></td>
<td align="center"></td>
</tr>
<tr><td align="left"><a href="configs/a2i2_nsc.yml">Att2in<br>+new_self_critical</a></td>
<td align="center">1.219</td>
<td align="center">0.2099</td>
<td align="center"><a href="https://drive.google.com/open?id=1BkxLPL4SuQ_qFa-4fN96u23iTFWw-iXX">model&metrics</a></td>
<td align="center"></td>
</tr>
<tr><td align="left"><a href="configs/transformer.yml">Transformer</a></td>
<td align="center">1.113</td>
<td align="center">0.2045</td>
<td align="center"><a href="https://drive.google.com/open?id=10Q5GJ2jZFCexD71rY9gg886Aasuaup8O">model&metrics</a></td>
<td align="center"></td>
</tr>
<tr><td align="left"><a href="configs/transformer_sc.yml">Transformer<br>+self_critical</a></td>
<td align="center">1.266</td>
<td align="center">0.2224</td>
<td align="center"><a href="https://drive.google.com/open?id=12iKJJSIGrzFth_dJXqcXy-_IjAU0I3DC">model&metrics</a></td>
<td align="center"></td>
</tr>

<tr><td align="left"><a href="configs/topdown.yml">topdown</a></td>
<td align="center">1.099</td>
<td align="center">0.1999</td>
<td align="center"><a href="https://drive.google.com/open?id=14w8YXrjxSAi5D4Adx8jgfg4geQ8XS8wH">model&metrics</a></td>
<td align="center">My replication</td>
</tr>
<tr><td align="left"><a href="configs/topdown_sc.yml">topdown<br>+self_critical</a></td>
<td align="center">1.227</td>
<td align="center">0.2145</td>
<td align="center"><a href="https://drive.google.com/open?id=1QdCigVWdDKTbUe3_HQFEGkAsv9XIkKkE">model&metrics</a></td>
<td align="center"></td>
</tr>
<tr><td align="left"><a href="configs/topdown_nsc.yml">topdown<br>+new_self_critical</a></td>
<td align="center">1.239</td>
<td align="center">0.2154</td>
<td align="center"><a href="https://drive.google.com/open?id=1cgoywxAdzHtIF2C6zNnIA7G2wjol_ybf">model&metrics</a></td>
<td align="center"></td>
</tr>
<tr><td align="left"><a href="configs/td_long_nsc.yml">Topdown<br>+Schedule long<br>+new_self_critical</a></td>
<td align="center">1.280</td>
<td align="center"><b>1.280</b></td>
<td align="center">0.2200</td>
<td align="center"><a href="https://drive.google.com/open?id=1bCDmf4JCM79f5Lqp6MAn1ap4b3NJ5Gis">model&metrics</a></td>
<td align="center">Best of 5 models<br>schedule proposed by yangxuntu</td>
Expand Down
8 changes: 6 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,9 +13,13 @@ This is based on my [ImageCaptioning.pytorch](https://github.com/ruotianluo/Imag

## Requirements
Python 2.7 (because there is no [coco-caption](https://github.com/tylin/coco-caption) version for python 3)

PyTorch 1.3 (along with torchvision)

cider (already been added as a submodule)
coco-caption (already been added as a submodule)

coco-caption (already been added as a submodule) (**Remember to follow initialization steps in coco-caption/README.md**)

yacs

(**Skip if you are using bottom-up feature**): If you want to use resnet to extract image features, you need to download pretrained resnet model for both training and evaluation. The models can be downloaded from [here](https://drive.google.com/open?id=0B7fNdx_jAqhtbVYzOURMdDNHSGM), and should be placed in `data/imagenet_weights`.
Expand Down Expand Up @@ -64,7 +68,7 @@ For more options, see `opts.py`.

First you should preprocess the dataset and get the cache for calculating cider score:
```
$ python scripts/prepro_ngrams.py --input_json .../dataset_coco.json --dict_json data/cocotalk.json --output_pkl data/coco-train --split train
$ python scripts/prepro_ngrams.py --input_json data/dataset_coco.json --dict_json data/cocotalk.json --output_pkl data/coco-train --split train
```

Then, copy the model from the pretrained model using cross entropy. (It's not mandatory to copy the model, just for back-up)
Expand Down
20 changes: 20 additions & 0 deletions configs/a2i2.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
# base
caption_model: att2in2
input_json: data/cocotalk.json
input_fc_dir: data/cocobu_fc
input_att_dir: data/cocobu_att
input_label_h5: data/cocotalk_label.h5
learning_rate: 0.0005
learning_rate_decay_start: 0
scheduled_sampling_start: 0
# checkpoint_path: $ckpt_path
# $start_from
language_eval: 1
save_checkpoint_every: 3000
val_images_use: 5000

train_sample_n: 5
self_critical_after: 30
batch_size: 10
learning_rate_decay_start: 0
max_epochs: 30
9 changes: 9 additions & 0 deletions configs/a2i2_nsc.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
_BASE_: a2i2.yml
learning_rate: 0.00005
learning_rate_decay_start: -1
self_critical_after: -1
structure_after: 30
structure_sample_n: 5
structure_loss_weight: 1
structure_loss_type: new_self_critical
max_epochs: 50
4 changes: 4 additions & 0 deletions configs/a2i2_sc.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
_BASE_: a2i2.yml
learning_rate: 0.00005
learning_rate_decay_start: -1
s: 50
19 changes: 19 additions & 0 deletions configs/fc_nsc.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
_BASE_: fc.yml
learning_rate: 0.00005
learning_rate_decay_start: -1
scheduled_sampling_start: -1

language_eval: 1
save_checkpoint_every: 3000
val_images_use: 5000

batch_size: 10
max_epochs: 50
cached_tokens: coco-train-idxs


self_critical_after: -1
structure_after: 30
structure_sample_n: 5
structure_loss_weight: 1
structure_loss_type: new_self_critical
2 changes: 1 addition & 1 deletion configs/topdown.yml
Original file line number Diff line number Diff line change
Expand Up @@ -20,4 +20,4 @@ train_sample_n: 5
self_critical_after: 30
batch_size: 10
learning_rate_decay_start: 0
max_epochs: 50
max_epochs: 30
1 change: 1 addition & 0 deletions configs/topdown_nsc.yml
Original file line number Diff line number Diff line change
Expand Up @@ -6,3 +6,4 @@ structure_after: 30
structure_sample_n: 5
structure_loss_weight: 1
structure_loss_type: new_self_critical
max_epoch: 50
4 changes: 3 additions & 1 deletion configs/topdown_sc.yml
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
_BASE_: topdown.yml
learning_rate: 0.00005
learning_rate_decay_start: -1
learning_rate_decay_start: -1

max_epoch: 50
5 changes: 5 additions & 0 deletions configs/transformer.yml
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,11 @@ seq_per_img: 5
batch_size: 10
learning_rate: 0.0005

# Notice: because I'm to lazy, I reuse the option name for RNNs to set the hyperparameters for transformer:
# N=num_layers
# d_model=input_encoding_size
# d_ff=rnn_size
# h is always 8
num_layers: 6
input_encoding_size: 512
rnn_size: 2048
Expand Down
5 changes: 3 additions & 2 deletions configs/transformer_sc.yml
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
_BASE_: transformer.yml
reduce_on_plateau: true

noamopt: false
learning_rate: 0.00001

self_critical_after: 15
self_critical_after: 15
max_epochs: 50
1 change: 1 addition & 0 deletions data/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,7 @@ This will create `data/cocobu_fc`, `data/cocobu_att` and `data/cocobu_box`. If y
#### Download converted files

bottomup-fc: [link](https://drive.google.com/file/d/1IpjCJ5LYC4kX2krxHcPgxAIipgA8uqTU/view?usp=sharing) (The fc features here are simply the average of the attention features)

bottomup-att: [link](https://drive.google.com/file/d/1hun0tsel34aXO4CYyTRIvHJkcbZHwjrD/view?usp=sharing)


Expand Down

0 comments on commit a58ec4f

Please sign in to comment.