You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In my experiments, I utilized the transformer mapping network with the default settings but I failed to reach the original metrics of the paper.
In more detail, in my experiments I used K=10 constant tokens, 10 prefix length, 8 multi head self attention layers with 8 heads each
, 10 epochs using a batch size of 40 and AdamW as an optimizer. The learning rate and warm up steps are default (2e^-5 , 5000).
The image encoder and the decoder are the default (ViT-B/32 and GPT2).
The mentioned metrics of the paper (with respect to the COCO dataset and the Transformer Mapping Network) are
( B4: 33.53% , METEOR: 27.45% , CIDEr: 113% )
in contrast to my metrics which are ( B4 : 71,72% , METEOR : 24.89% , CIDEr: 90.91%),
which are significant less than the original.
Lastly, I have to mention that the above experiment is trained on a single GPU and the validation is from the COCO dataset.
The evaluation metrics are calculated from the pycocoevalcap repository.
Any ideas on how to reach the original model's performance?
The text was updated successfully, but these errors were encountered:
Hello,
thank you very much for your work.
In my experiments, I utilized the transformer mapping network with the default settings but I failed to reach the original metrics of the paper.
In more detail, in my experiments I used K=10 constant tokens, 10 prefix length, 8 multi head self attention layers with 8 heads each
, 10 epochs using a batch size of 40 and AdamW as an optimizer. The learning rate and warm up steps are default (2e^-5 , 5000).
The image encoder and the decoder are the default (ViT-B/32 and GPT2).
The mentioned metrics of the paper (with respect to the COCO dataset and the Transformer Mapping Network) are
( B4: 33.53% , METEOR: 27.45% , CIDEr: 113% )
in contrast to my metrics which are ( B4 : 71,72% , METEOR : 24.89% , CIDEr: 90.91%),
which are significant less than the original.
Lastly, I have to mention that the above experiment is trained on a single GPU and the validation is from the COCO dataset.
The evaluation metrics are calculated from the pycocoevalcap repository.
Any ideas on how to reach the original model's performance?
The text was updated successfully, but these errors were encountered: