Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metrics of ClipCap's Original Performance #65

Open
chmorfop opened this issue Mar 23, 2023 · 2 comments
Open

Metrics of ClipCap's Original Performance #65

chmorfop opened this issue Mar 23, 2023 · 2 comments

Comments

@chmorfop
Copy link

Hello,
thank you very much for your work.

In my experiments, I utilized the transformer mapping network with the default settings but I failed to reach the original metrics of the paper.

In more detail, in my experiments I used K=10 constant tokens, 10 prefix length, 8 multi head self attention layers with 8 heads each
, 10 epochs using a batch size of 40 and AdamW as an optimizer. The learning rate and warm up steps are default (2e^-5 , 5000).
The image encoder and the decoder are the default (ViT-B/32 and GPT2).

The mentioned metrics of the paper (with respect to the COCO dataset and the Transformer Mapping Network) are
( B4: 33.53% , METEOR: 27.45% , CIDEr: 113% )
in contrast to my metrics which are ( B4 : 71,72% , METEOR : 24.89% , CIDEr: 90.91%),
which are significant less than the original.

Lastly, I have to mention that the above experiment is trained on a single GPU and the validation is from the COCO dataset.
The evaluation metrics are calculated from the pycocoevalcap repository.

Any ideas on how to reach the original model's performance?

@baiyuting
Copy link

there may be something wrong with you metrics, because B4 is 71,72%, which is too high

@cjc20000323
Copy link

Do you solve this problem?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants