Results are on karpathy test split, beam size 5. The evaluated models are the checkpoint with the highest CIDEr on validation set. Without notice, the numbers shown are not selected. The scores are just used to verify if you are getting things right. If the scores you get is close to the number I give (it could be higher or lower), then it's ok.
Collection: link
Name | CIDEr | SPICE | Download | Note |
---|---|---|---|---|
FC | 0.953 | 0.1787 | model&metrics | --caption_model newfc |
FC +self_critical |
1.045 | 0.1838 | model&metrics | --caption_model newfc |
FC +new_self_critical |
1.053 | 0.1857 | model&metrics | --caption_model newfc |
Collection: link
Name | CIDEr | SPICE | Download | Note |
---|---|---|---|---|
Att2in | 1.089 | 0.1982 | model&metrics | My replication |
Att2in +self_critical |
1.173 | 0.2046 | model&metrics | |
Att2in +new_self_critical |
1.195 | 0.2066 | model&metrics | |
UpDown | 1.099 | 0.1999 | model&metrics | My replication |
UpDown +self_critical |
1.227 | 0.2145 | model&metrics | |
UpDown +new_self_critical |
1.239 | 0.2154 | model&metrics | |
UpDown +Schedule long +new_self_critical |
1.280 | 0.2200 | model&metrics | Best of 5 models schedule proposed by yangxuntu |
Transformer | 1.1259 | 0.2063 | model&metrics | |
Transformer(warmup+step decay) | 1.1496 | 0.2093 | model&metrics | Although this schedule is better, the final self critical results are similar. |
Transformer +self_critical |
1.277 | 0.2249 | model&metrics | This could be higher in my opinion. I chose the checkpoint with the highest CIDEr on val set, so it's possible some other checkpoint may perform better. Just let you know. |
Transformer +new_self_critical |
1.303 | 0.2289 | model&metrics |
Collection: link
Name | CIDEr | SPICE | Download | Note |
---|---|---|---|---|
Transformer | 1.158 | 0.2114 | model&metrics | The config needs to be changed to use the vilbert feature. |