Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: make general purpose metrics more general #1666

Merged
merged 14 commits into from
Nov 19, 2024

Conversation

jjmachan
Copy link
Member

@jjmachan jjmachan commented Nov 13, 2024

Metrics Converted

  • Aspect Critic
  • Simple Criteria
  • Rubric Based - both Instance and Domain specific

a few different examples

Aspect Critic

from ragas.metrics import AspectCritic
from ragas.dataset_schema import SingleTurnSample

only_response = SingleTurnSample(
    response="The Eiffel Tower is located in Paris."
)

grammar_critic = AspectCritic(
    name="grammar",
    definition="Is the response grammatically correct?",
    llm=evaluator_llm
)

await grammar_critic.single_turn_ascore(only_response)

with reference

answer_correctness_critic = AspectCritic(
    name="answer_correctness",
    definition="Is the response and reference answer are the same?",
    llm=evaluator_llm
)

# data row
sample = SingleTurnSample(
    user_input="Where is the Eiffel Tower located?",
    response="The Eiffel Tower is located in Paris.",
    reference="London"
)
await answer_correctness_critic.single_turn_ascore(sample)

Note: this only works for multi-turn metrics for now

@dosubot dosubot bot added the size:L This PR changes 100-499 lines, ignoring generated files. label Nov 13, 2024
@dosubot dosubot bot added size:XL This PR changes 500-999 lines, ignoring generated files. and removed size:L This PR changes 100-499 lines, ignoring generated files. labels Nov 13, 2024
@dosubot dosubot bot added size:XXL This PR changes 1000+ lines, ignoring generated files. and removed size:XL This PR changes 500-999 lines, ignoring generated files. labels Nov 13, 2024
@shahules786
Copy link
Member

shahules786 commented Nov 14, 2024

  • Let's say I am evaluating using an evaluation dataset that contains user_input, response, reference, and retrieved_context. Consider scenarios of the user using two metrics with this dataset, ie context_recall and aspect critic (as harmfulness or something) at the same time. Using this interface, even if my aspect critic metric does not need retrieved context it will use it. There is no way to opt-out. I think this can occur when using the metrics using the evaluate interface.

  • When using a single metric this seems to be fine.
    @jjmachan

Comment on lines 129 to 135
MetricType.SINGLE_TURN: {
"user_input",
"response",
"user_input:optional",
"response:optional",
"retrieved_contexts:optional",
"reference:optional",
"reference_contexts:optional",
},
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can change it here - user control

Copy link
Member

@shahules786 shahules786 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reflect changes in docs.

)
reference: t.Optional[str] = Field(
description="The reference answer for evaluation", default=None
)
criteria: str = Field(description="The criteria to evaluate the response")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed that both here and with simple criteria this should be removed. If not with this PR, I can make another PR after this is merged.

@jjmachan jjmachan merged commit f14cd85 into explodinggradients:main Nov 19, 2024
15 checks passed
@jjmachan jjmachan deleted the feat/general branch November 19, 2024 12:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size:XXL This PR changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants