From a58ec4f03ed9626d2151f7afad07d696d33f33f1 Mon Sep 17 00:00:00 2001 From: Ruotian Luo Date: Tue, 31 Dec 2019 16:50:39 -0600 Subject: [PATCH] Update model zoo, advanced, readme --- ADVANCED.md | 36 ++++++++++++++ MODEL_ZOO.md | 97 ++++++++++++++++++++++++++++++++++---- README.md | 8 +++- configs/a2i2.yml | 20 ++++++++ configs/a2i2_nsc.yml | 9 ++++ configs/a2i2_sc.yml | 4 ++ configs/fc_nsc.yml | 19 ++++++++ configs/topdown.yml | 2 +- configs/topdown_nsc.yml | 1 + configs/topdown_sc.yml | 4 +- configs/transformer.yml | 5 ++ configs/transformer_sc.yml | 5 +- data/README.md | 1 + 13 files changed, 197 insertions(+), 14 deletions(-) create mode 100644 configs/a2i2.yml create mode 100644 configs/a2i2_nsc.yml create mode 100644 configs/a2i2_sc.yml create mode 100644 configs/fc_nsc.yml diff --git a/ADVANCED.md b/ADVANCED.md index 0e3830dc..6e717f30 100644 --- a/ADVANCED.md +++ b/ADVANCED.md @@ -8,6 +8,42 @@ Current ensemble only supports models which are subclass of AttModel. Here is ex python eval_ensemble.py --dump_json 0 --ids model1,model2,model3 --weights 0.3,0.3,0.3 --batch_size 1 --dump_images 0 --num_images 5000 --split test --language_eval 1 --beam_size 5 --temperature 1.0 --sample_method greedy --max_length 30 ``` +## BPE + +``` +python scripts/prepro_labels.py --input_json data/dataset_coco.json --output_json data/cocotalk_bpe.json --output_h5 data/cocotalk_bpe --symbol_count 6000 +``` + +Doesn't seem to help improve performance. + +## Use lmdb instead of a folder of countless files + +It's known that some file systems do not like a folder with a lot of single files. However, in this project, the default way of saving precomputed image features is to save each image feature as an individual file. + +Usually, for COCO, once all the features have been cached on the memory (basically after the first epoch), then the time for reading data is negligible. However, for much larger dataset like ConceptualCaptioning, since the number of features is too large and the memory cannot fit all image features, this results in extremely slow data loading and is always slow even passing one epoch. + +For that dataset, I used lmdb to save all the features. Although it is still slow to load the data, it's much better compared to saving individual files. + +To generate lmdb file from a folder of features, check out `scripts/dump_to_lmdb.py` which is borrowed from [Lyken17/Efficient-PyTorch](https://github.com/Lyken17/Efficient-PyTorch/tools). + +I believe the current way of using lmdb in `dataloader.py` is far from optimal. I tried methods in tensorpack but failed to make it work. (The idea was to ready by chunk, so that the lmdb loading can load a chunk at a time, reducing the time for ad hoc disk visiting.) + +## new self critical + +This "new self critical" is borrowed from "Variational inference for monte carlo objectives". The only difference from the original self critical, is the definition of baseline. + +In the original self critical, the baseline is the score of greedy decoding output. In new self critical, the baseline is the average score of the other samples (this requires the model to generate multiple samples for each image). + +To try self critical on topdown model, you can run + +`python train.py --cfg configs/topdown_nsc.yml` + +This yml file can also provides you some hint what to change to use new self critical. + +## Sample n captions + +When sampling, set `sample_n` to be greater than 0. + ## Batch normalization ## Box feature \ No newline at end of file diff --git a/MODEL_ZOO.md b/MODEL_ZOO.md index 51b58e22..18fb0906 100644 --- a/MODEL_ZOO.md +++ b/MODEL_ZOO.md @@ -1,10 +1,10 @@ -# Models trained with Resnet101 feature: +# Models -Models are provided in [link](https://drive.google.com/open?id=0B7fNdx_jAqhtcXp0aFlWSnJmb0k) +Results are on karpathy test split, beam size 5. Without notice, the numbers shown are not selected. The scores are just used to verify if you are getting things right. If the scores you get is close to the number I give (it could be higher or lower), then it's ok. -# Models trained with Bottomup feature: +# Trained with Resnet101 feature: -Results are on karpathy test split, beam size 5. +Collection: [link](https://drive.google.com/open?id=0B7fNdx_jAqhtcXp0aFlWSnJmb0k) @@ -12,12 +12,93 @@ Results are on karpathy test split, beam size 5. - - + + - + + + + + + + + + + + + + + + + + + +
Name CIDEr SPICEdownloadcommentDownloadNote
FC0.9530.1787model&metrics--caption_model newfc
FC
+self_critical
1.0450.1838model&metrics--caption_model newfc
FC
+new_self_critical
1.0660.1856model&metrics--caption_model newfc
+ +# Trained with Bottomup feature: + +Collection: [link](https://drive.google.com/open?id=1-RNak8qLUR5LqfItY6OenbRl8sdwODng) + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + - + diff --git a/README.md b/README.md index 69056cf4..4cad8b32 100644 --- a/README.md +++ b/README.md @@ -13,9 +13,13 @@ This is based on my [ImageCaptioning.pytorch](https://github.com/ruotianluo/Imag ## Requirements Python 2.7 (because there is no [coco-caption](https://github.com/tylin/coco-caption) version for python 3) + PyTorch 1.3 (along with torchvision) + cider (already been added as a submodule) -coco-caption (already been added as a submodule) + +coco-caption (already been added as a submodule) (**Remember to follow initialization steps in coco-caption/README.md**) + yacs (**Skip if you are using bottom-up feature**): If you want to use resnet to extract image features, you need to download pretrained resnet model for both training and evaluation. The models can be downloaded from [here](https://drive.google.com/open?id=0B7fNdx_jAqhtbVYzOURMdDNHSGM), and should be placed in `data/imagenet_weights`. @@ -64,7 +68,7 @@ For more options, see `opts.py`. First you should preprocess the dataset and get the cache for calculating cider score: ``` -$ python scripts/prepro_ngrams.py --input_json .../dataset_coco.json --dict_json data/cocotalk.json --output_pkl data/coco-train --split train +$ python scripts/prepro_ngrams.py --input_json data/dataset_coco.json --dict_json data/cocotalk.json --output_pkl data/coco-train --split train ``` Then, copy the model from the pretrained model using cross entropy. (It's not mandatory to copy the model, just for back-up) diff --git a/configs/a2i2.yml b/configs/a2i2.yml new file mode 100644 index 00000000..808582b4 --- /dev/null +++ b/configs/a2i2.yml @@ -0,0 +1,20 @@ +# base +caption_model: att2in2 +input_json: data/cocotalk.json +input_fc_dir: data/cocobu_fc +input_att_dir: data/cocobu_att +input_label_h5: data/cocotalk_label.h5 +learning_rate: 0.0005 +learning_rate_decay_start: 0 +scheduled_sampling_start: 0 +# checkpoint_path: $ckpt_path +# $start_from +language_eval: 1 +save_checkpoint_every: 3000 +val_images_use: 5000 + +train_sample_n: 5 +self_critical_after: 30 +batch_size: 10 +learning_rate_decay_start: 0 +max_epochs: 30 diff --git a/configs/a2i2_nsc.yml b/configs/a2i2_nsc.yml new file mode 100644 index 00000000..785e5ff3 --- /dev/null +++ b/configs/a2i2_nsc.yml @@ -0,0 +1,9 @@ +_BASE_: a2i2.yml +learning_rate: 0.00005 +learning_rate_decay_start: -1 +self_critical_after: -1 +structure_after: 30 +structure_sample_n: 5 +structure_loss_weight: 1 +structure_loss_type: new_self_critical +max_epochs: 50 diff --git a/configs/a2i2_sc.yml b/configs/a2i2_sc.yml new file mode 100644 index 00000000..a42a8331 --- /dev/null +++ b/configs/a2i2_sc.yml @@ -0,0 +1,4 @@ +_BASE_: a2i2.yml +learning_rate: 0.00005 +learning_rate_decay_start: -1 +s: 50 \ No newline at end of file diff --git a/configs/fc_nsc.yml b/configs/fc_nsc.yml new file mode 100644 index 00000000..124f617e --- /dev/null +++ b/configs/fc_nsc.yml @@ -0,0 +1,19 @@ +_BASE_: fc.yml +learning_rate: 0.00005 +learning_rate_decay_start: -1 +scheduled_sampling_start: -1 + +language_eval: 1 +save_checkpoint_every: 3000 +val_images_use: 5000 + +batch_size: 10 +max_epochs: 50 +cached_tokens: coco-train-idxs + + +self_critical_after: -1 +structure_after: 30 +structure_sample_n: 5 +structure_loss_weight: 1 +structure_loss_type: new_self_critical diff --git a/configs/topdown.yml b/configs/topdown.yml index 760dcf1a..324892e8 100644 --- a/configs/topdown.yml +++ b/configs/topdown.yml @@ -20,4 +20,4 @@ train_sample_n: 5 self_critical_after: 30 batch_size: 10 learning_rate_decay_start: 0 -max_epochs: 50 +max_epochs: 30 diff --git a/configs/topdown_nsc.yml b/configs/topdown_nsc.yml index 3254c247..4a73a4ea 100644 --- a/configs/topdown_nsc.yml +++ b/configs/topdown_nsc.yml @@ -6,3 +6,4 @@ structure_after: 30 structure_sample_n: 5 structure_loss_weight: 1 structure_loss_type: new_self_critical +max_epoch: 50 diff --git a/configs/topdown_sc.yml b/configs/topdown_sc.yml index 2d1bee5d..7a018dcf 100644 --- a/configs/topdown_sc.yml +++ b/configs/topdown_sc.yml @@ -1,3 +1,5 @@ _BASE_: topdown.yml learning_rate: 0.00005 -learning_rate_decay_start: -1 \ No newline at end of file +learning_rate_decay_start: -1 + +max_epoch: 50 \ No newline at end of file diff --git a/configs/transformer.yml b/configs/transformer.yml index 6716eac0..a08ef544 100644 --- a/configs/transformer.yml +++ b/configs/transformer.yml @@ -10,6 +10,11 @@ seq_per_img: 5 batch_size: 10 learning_rate: 0.0005 +# Notice: because I'm to lazy, I reuse the option name for RNNs to set the hyperparameters for transformer: +# N=num_layers +# d_model=input_encoding_size +# d_ff=rnn_size +# h is always 8 num_layers: 6 input_encoding_size: 512 rnn_size: 2048 diff --git a/configs/transformer_sc.yml b/configs/transformer_sc.yml index 4d066a2d..75a68035 100644 --- a/configs/transformer_sc.yml +++ b/configs/transformer_sc.yml @@ -1,6 +1,7 @@ _BASE_: transformer.yml reduce_on_plateau: true - +noamopt: false learning_rate: 0.00001 -self_critical_after: 15 \ No newline at end of file +self_critical_after: 15 +max_epochs: 50 \ No newline at end of file diff --git a/data/README.md b/data/README.md index f0243145..b396437a 100644 --- a/data/README.md +++ b/data/README.md @@ -55,6 +55,7 @@ This will create `data/cocobu_fc`, `data/cocobu_att` and `data/cocobu_box`. If y #### Download converted files bottomup-fc: [link](https://drive.google.com/file/d/1IpjCJ5LYC4kX2krxHcPgxAIipgA8uqTU/view?usp=sharing) (The fc features here are simply the average of the attention features) + bottomup-att: [link](https://drive.google.com/file/d/1hun0tsel34aXO4CYyTRIvHJkcbZHwjrD/view?usp=sharing)
NameCIDErSPICEDownloadNote
Att2in1.0890.1982model&metricsMy replication
Att2in
+self_critical
1.1730.2046model&metrics
Att2in
+new_self_critical
1.2190.2099model&metrics
Transformer1.1130.2045model&metrics
Transformer
+self_critical
1.2660.2224model&metrics
topdown1.0990.1999model&metricsMy replication
topdown
+self_critical
1.2270.2145model&metrics
topdown
+new_self_critical
1.2390.2154model&metrics
Topdown
+Schedule long
+new_self_critical
1.2801.280 0.2200 model&metrics Best of 5 models
schedule proposed by yangxuntu