metrics about region captioning #36

Hoteryoung · 2024-04-19T02:44:53Z

I evaluated the VQA and scene cls tasks on the model fine-tuned using GeoChatInstruct, and the results are pretty close to the metrics reported in the paper, however, the region captioning result is a bit far from the paper.
The official evaluation result:

My result:

Note that:

I finetuned the model only the first stage, which means I finetuned the LLaVA-v1.5-7b using GeoChatInstruct for only one epoch, and I did not further fine-tune the model using only referring and grounding samples since the lack of details in the paper about the stage 2 fientune.
I used the evaluate package of HuggingFace.

I wonder whether I did something wrong or the metric gap is caused by the stage2 finetune?

Davidup1 · 2024-04-27T14:56:25Z

@Hoteryoung I also met this problem and the metric fell even lower after stage2 finetune😂

Oreouo · 2024-08-18T08:45:08Z

Could you show me the code for calculating the metrics of your referring?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

metrics about region captioning #36

metrics about region captioning #36

Hoteryoung commented Apr 19, 2024 •

edited

Loading

Davidup1 commented Apr 27, 2024

Oreouo commented Aug 18, 2024

metrics about region captioning #36

metrics about region captioning #36

Comments

Hoteryoung commented Apr 19, 2024 • edited Loading

Davidup1 commented Apr 27, 2024

Oreouo commented Aug 18, 2024

Hoteryoung commented Apr 19, 2024 •

edited

Loading