We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hello, I tried to add a new metric to an existing multiple-choice task, but it seems that the metric was not added. I edited MedQA:
task: medqa_4options dataset_path: GBaker/MedQA-USMLE-4-options-hf output_type: multiple_choice training_split: train validation_split: validation test_split: test doc_to_text: !function preprocess_medqa.doc_to_text doc_to_target: !function preprocess_medqa.doc_to_target doc_to_choice: [ 'A', 'B', 'C', 'D' ] metric_list: - metric: acc aggregation: mean higher_is_better: true - metric: acc_norm aggregation: mean higher_is_better: true - metric: !function preprocess_medqa.precision_fn aggregation: !function preprocess_medqa.precision_metric higher_is_better: true
In addition I edited preprocess_medqa.py and added the new metric:
preprocess_medqa.py
from sklearn.metrics import classification_report def doc_to_text(doc) -> str: option_choices = { "A": doc["ending0"], "B": doc["ending1"], "C": doc["ending2"], "D": doc["ending3"], } answers = "".join((f"{k}. {v}\n") for k, v in option_choices.items()) return f"Question: {doc['sent1']}\n{answers}Answer:" def doc_to_target(doc) -> int: return doc["label"] def precision_fn(items): print("in precision_fn") return items def precision_metric(items) -> float: print("in precision metric !!") return 0.5
However, I don't find the prints or the new metric in the results of lm-eval.
Running loglikelihood requests: 100%|████████████| 5092/5092 [00:50<00:00, 100.08it/s] 2024-09-23:07:19:53,957 INFO [evaluation_tracker.py:269] Output path not provided, skipping saving results aggregated hf (pretrained=meta-llama/Meta-Llama-3.1-8B-Instruct), gen_kwargs: (None), limit: None, num_fewshot: None, batch_size: 1 | Tasks |Version|Filter|n-shot| Metric | |Value | |Stderr| |--------------|-------|------|-----:|--------|---|-----:|---|-----:| |medqa_4options|Yaml |none | 0|acc |↑ |0.4501|± |0.0139| | | |none | 0|acc_norm|↑ |0.4501|± |0.0139|
The run command was: lm_eval --model hf --model_args pretrained=meta-llama/Meta-Llama-3.1-8B-Instruct --apply_chat_template --tasks medqa_4options
lm_eval --model hf --model_args pretrained=meta-llama/Meta-Llama-3.1-8B-Instruct --apply_chat_template --tasks medqa_4options
Can you please help me? Thanks!
The text was updated successfully, but these errors were encountered:
No branches or pull requests
Hello,
I tried to add a new metric to an existing multiple-choice task, but it seems that the metric was not added.
I edited MedQA:
In addition I edited
preprocess_medqa.py
and added the new metric:However, I don't find the prints or the new metric in the results of lm-eval.
The run command was:
lm_eval --model hf --model_args pretrained=meta-llama/Meta-Llama-3.1-8B-Instruct --apply_chat_template --tasks medqa_4options
Can you please help me?
Thanks!
The text was updated successfully, but these errors were encountered: