gptjudge empty response handling #34

snova-nidhih · 2024-10-28T18:40:48Z

Occasionally when using gpt as a judge, the response is empty. From our test set, this occurred only once and is considered incorrect by default.

Test command:
python3 -m accelerate.commands.launch \ --num_processes=1 \ -m lmms_eval \ --model internvl2 \ --model_args pretrained="/import/ml-sc-scratch3/nidhih/mm_hungarian/finetune_outputs/pretrain_hupdf2_synthdog_hu_all_unfrozen" \ --tasks docvqa_hu_syn \ --batch_size 1 \ --log_samples \ --log_samples_suffix intervl2_8b \ --output_path /import/ml-sc-scratch3/nidhih/mm_hungarian/lmms_eval_op/internvl2_8b_docvqasyn_pt_test \ --limit 200

snova-jonathanl · 2024-10-30T01:03:18Z

lmms_eval/api/metrics.py

@@ -405,7 +405,10 @@ def gpt4judge(references, predictions, query):  # This is a passthrough function
                    eval_logger.error(f"All 5 attempts failed. Last error message: {str(e)}.\nResponse: {str(error_msg)}")
                    response = ""

-        score = int(extract_number_from_brackets(response))
+        if response is None:  # Rare case of gpt returning empty response
+            score = 0


Can we do a try+except wrapping extract_number_from_brackets, and also add logging whenever this is caught?

snova-nidhih added 4 commits October 28, 2024 11:08

catch empty response

c086d22

remove extras

b1c8e92

isort

34fa0c0

black lint

8005c3d

snova-jonathanl reviewed Oct 30, 2024

View reviewed changes

snova-nidhih marked this pull request as draft October 31, 2024 21:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gptjudge empty response handling #34

gptjudge empty response handling #34

snova-nidhih commented Oct 28, 2024

snova-jonathanl Oct 30, 2024

gptjudge empty response handling #34

Are you sure you want to change the base?

gptjudge empty response handling #34

Conversation

snova-nidhih commented Oct 28, 2024

snova-jonathanl Oct 30, 2024

Choose a reason for hiding this comment