Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add hallucination multicalibrator, with example benchmark #152

Merged
merged 74 commits into from
Nov 20, 2023

Conversation

gianlucadetommaso
Copy link
Contributor

Add the HallucinationMulticalibrator. This is able to take in input a generative model, a tokenizer and some data, then fit the calibrator and predict calibrated probabilities of hallucinations.

Copy link
Collaborator

@wistuba wistuba left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left some comments. Maybe you want to add some more unit tests.

for task in task_list
]

answer_map = {a: i for i, a in enumerate(auc)}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

choices seem to be in list("ABCD"). why has the answer map more options?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tries to be more generic, but you're right, can be restricted to "ABCD".

benchmarks/hallucination/mmlu/run.py Show resolved Hide resolved
fortuna/hallucination/base.py Show resolved Hide resolved
fortuna/hallucination/base.py Outdated Show resolved Hide resolved
fortuna/hallucination/base.py Outdated Show resolved Hide resolved
}

with torch.no_grad():
__logits = self.generative_model(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

double _ intentional?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep

fortuna/hallucination/base.py Show resolved Hide resolved
fortuna/hallucination/base.py Outdated Show resolved Hide resolved


class TestScoringModel(unittest.TestCase):
def test_score(self):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can't you compare against exact values?

import re


def string_cleaner(text: str) -> str:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great candidate for a function to have unit tests

@gianlucadetommaso gianlucadetommaso merged commit 76ad7a2 into main Nov 20, 2023
6 checks passed
@gianlucadetommaso gianlucadetommaso deleted the grouping2 branch November 20, 2023 12:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants