Failed to add a new metric #2330

Ofir408 · 2024-09-23T11:27:33Z

Hello,
I tried to add a new metric to an existing multiple-choice task, but it seems that the metric was not added.
I edited MedQA:

task: medqa_4options
dataset_path: GBaker/MedQA-USMLE-4-options-hf
output_type: multiple_choice
training_split: train
validation_split: validation
test_split: test
doc_to_text: !function preprocess_medqa.doc_to_text
doc_to_target: !function preprocess_medqa.doc_to_target
doc_to_choice: [ 'A', 'B', 'C', 'D' ]
metric_list:
  - metric: acc
    aggregation: mean
    higher_is_better: true
  - metric: acc_norm
    aggregation: mean
    higher_is_better: true
  - metric: !function preprocess_medqa.precision_fn
    aggregation: !function preprocess_medqa.precision_metric
    higher_is_better: true

In addition I edited preprocess_medqa.py and added the new metric:

from sklearn.metrics import classification_report

def doc_to_text(doc) -> str:
    option_choices = {
        "A": doc["ending0"],
        "B": doc["ending1"],
        "C": doc["ending2"],
        "D": doc["ending3"],
    }
    answers = "".join((f"{k}. {v}\n") for k, v in option_choices.items())
    return f"Question: {doc['sent1']}\n{answers}Answer:"


def doc_to_target(doc) -> int:
    return doc["label"]

def precision_fn(items):
    print("in precision_fn")
    return items

def precision_metric(items) -> float:
    print("in precision metric !!")
    return 0.5

However, I don't find the prints or the new metric in the results of lm-eval.

Running loglikelihood requests: 100%|████████████| 5092/5092 [00:50<00:00, 100.08it/s]
2024-09-23:07:19:53,957 INFO     [evaluation_tracker.py:269] Output path not provided, skipping saving results aggregated
hf (pretrained=meta-llama/Meta-Llama-3.1-8B-Instruct), gen_kwargs: (None), limit: None, num_fewshot: None, batch_size: 1
|    Tasks     |Version|Filter|n-shot| Metric |   |Value |   |Stderr|
|--------------|-------|------|-----:|--------|---|-----:|---|-----:|
|medqa_4options|Yaml   |none  |     0|acc     |↑  |0.4501|±  |0.0139|
|              |       |none  |     0|acc_norm|↑  |0.4501|±  |0.0139|

The run command was:
lm_eval --model hf --model_args pretrained=meta-llama/Meta-Llama-3.1-8B-Instruct --apply_chat_template --tasks medqa_4options

Can you please help me?
Thanks!

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Failed to add a new metric #2330

Failed to add a new metric #2330

Ofir408 commented Sep 23, 2024 •

edited

Loading

Failed to add a new metric #2330

Failed to add a new metric #2330

Comments

Ofir408 commented Sep 23, 2024 • edited Loading

Ofir408 commented Sep 23, 2024 •

edited

Loading