-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add hallucination multicalibrator, with example benchmark #152
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left some comments. Maybe you want to add some more unit tests.
benchmarks/hallucination/mmlu/run.py
Outdated
for task in task_list | ||
] | ||
|
||
answer_map = {a: i for i, a in enumerate(auc)} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
choices seem to be in list("ABCD"). why has the answer map more options?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tries to be more generic, but you're right, can be restricted to "ABCD".
fortuna/hallucination/base.py
Outdated
} | ||
|
||
with torch.no_grad(): | ||
__logits = self.generative_model( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
double _
intentional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep
|
||
|
||
class TestScoringModel(unittest.TestCase): | ||
def test_score(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can't you compare against exact values?
import re | ||
|
||
|
||
def string_cleaner(text: str) -> str: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great candidate for a function to have unit tests
Add the
HallucinationMulticalibrator
. This is able to take in input a generative model, a tokenizer and some data, then fit the calibrator and predict calibrated probabilities of hallucinations.