You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I evaluated the VQA and scene cls tasks on the model fine-tuned using GeoChatInstruct, and the results are pretty close to the metrics reported in the paper, however, the region captioning result is a bit far from the paper.
The official evaluation result:
My result:
Note that:
I finetuned the model only the first stage, which means I finetuned the LLaVA-v1.5-7b using GeoChatInstruct for only one epoch, and I did not further fine-tune the model using only referring and grounding samples since the lack of details in the paper about the stage 2 fientune.
I used the evaluate package of HuggingFace.
I wonder whether I did something wrong or the metric gap is caused by the stage2 finetune?
The text was updated successfully, but these errors were encountered:
I evaluated the VQA and scene cls tasks on the model fine-tuned using GeoChatInstruct, and the results are pretty close to the metrics reported in the paper, however, the region captioning result is a bit far from the paper.
The official evaluation result:
My result:
Note that:
I wonder whether I did something wrong or the metric gap is caused by the stage2 finetune?
The text was updated successfully, but these errors were encountered: