Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

metrics about region captioning #36

Open
Hoteryoung opened this issue Apr 19, 2024 · 2 comments
Open

metrics about region captioning #36

Hoteryoung opened this issue Apr 19, 2024 · 2 comments

Comments

@Hoteryoung
Copy link

Hoteryoung commented Apr 19, 2024

I evaluated the VQA and scene cls tasks on the model fine-tuned using GeoChatInstruct, and the results are pretty close to the metrics reported in the paper, however, the region captioning result is a bit far from the paper.
The official evaluation result:
image
My result:
image
Note that:

  1. I finetuned the model only the first stage, which means I finetuned the LLaVA-v1.5-7b using GeoChatInstruct for only one epoch, and I did not further fine-tune the model using only referring and grounding samples since the lack of details in the paper about the stage 2 fientune.
  2. I used the evaluate package of HuggingFace.

I wonder whether I did something wrong or the metric gap is caused by the stage2 finetune?

@Davidup1
Copy link

@Hoteryoung I also met this problem and the metric fell even lower after stage2 finetune😂

@Oreouo
Copy link

Oreouo commented Aug 18, 2024

Could you show me the code for calculating the metrics of your referring?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants