Metrics of ClipCap's Original Performance #65

chmorfop · 2023-03-23T15:22:43Z

Hello,
thank you very much for your work.

In my experiments, I utilized the transformer mapping network with the default settings but I failed to reach the original metrics of the paper.

In more detail, in my experiments I used K=10 constant tokens, 10 prefix length, 8 multi head self attention layers with 8 heads each
, 10 epochs using a batch size of 40 and AdamW as an optimizer. The learning rate and warm up steps are default (2e^-5 , 5000).
The image encoder and the decoder are the default (ViT-B/32 and GPT2).

The mentioned metrics of the paper (with respect to the COCO dataset and the Transformer Mapping Network) are
( B4: 33.53% , METEOR: 27.45% , CIDEr: 113% )
in contrast to my metrics which are ( B4 : 71,72% , METEOR : 24.89% , CIDEr: 90.91%),
which are significant less than the original.

Lastly, I have to mention that the above experiment is trained on a single GPU and the validation is from the COCO dataset.
The evaluation metrics are calculated from the pycocoevalcap repository.

Any ideas on how to reach the original model's performance?

baiyuting · 2023-09-04T02:35:02Z

there may be something wrong with you metrics, because B4 is 71,72%, which is too high

cjc20000323 · 2023-09-25T12:54:21Z

Do you solve this problem?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Metrics of ClipCap's Original Performance #65

Metrics of ClipCap's Original Performance #65

chmorfop commented Mar 23, 2023

baiyuting commented Sep 4, 2023

cjc20000323 commented Sep 25, 2023

Metrics of ClipCap's Original Performance #65

Metrics of ClipCap's Original Performance #65

Comments

chmorfop commented Mar 23, 2023

baiyuting commented Sep 4, 2023

cjc20000323 commented Sep 25, 2023