Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Evaluations – Remove conflicts in evaluation objective between configuration and prompt #420

Closed
samufyi opened this issue Oct 14, 2024 · 1 comment · Fixed by #489
Closed
Assignees

Comments

@samufyi
Copy link
Contributor

samufyi commented Oct 14, 2024

What?

Right now, there is a config when creating an evaluation that sets the result (numeric between 1 and 5), but we don't pass this to the prompt, so the user can set a range between 9 and 20, and it would be the one that takes into account.

In summary, there are 2 sources of truth.

https://www.figma.com/design/ODioXiqX8aeDMonsh0HBui/Latitude-Cloud?node-id=2738-34189&t=C31y3Hbykh3pzF2x-4

@samufyi samufyi converted this from a draft issue Oct 14, 2024
@samufyi samufyi added the p1 High priority issues label Oct 14, 2024
@samufyi samufyi self-assigned this Oct 15, 2024
@samufyi samufyi removed the p1 High priority issues label Oct 15, 2024
@csansoon csansoon moved this from Design to Next in Latitude LLM Roadmap Oct 21, 2024
@csansoon csansoon self-assigned this Oct 22, 2024
@csansoon
Copy link
Contributor

csansoon commented Oct 28, 2024

The plan

To do this, the schema must change.

Now, each evaluation will have 2 polymorphic relations: metadataType and resultType

There will currently be 2 EvaluationMetadataTypes:

  • LlmAsJudgeAdvanced: The current metadata. It will contain the prompt and configuration json.
  • LlmAsJudge: This one will contain fields like objective and additionalInstructions.

And 3 EvaluationResultConfigurations, which will depend on a ResultableType:

  • Boolean: It contains fields like trueResultDescription and falseResultDescription
  • Numerical: It contains fields like minValue, minValueDescription, maxValue and maxValueDescription
  • Text: It contains fields like valueDescription

The evaluation will expect results depending on the resultType, and will have different behaviour depending on its type.

This allows for many more types of evaluations in the future, both llmAsJudge or any other type (like Human in the Loop), while maintaining the resultable types that we have now.

EvaluationResults will still be the same, as it still fits the use case

Development breakdown

Part 1 — EvaluationMetadataLlmAsJudgeAdvanced

In this first part, I'll focus on modifying and migrating to the new EvaluationMetadataLlmAsJudgeAdvanced schema.

This type does not require a resultConfiguration yet, since it is defined in the configuration json. I'll just move this json to the EvaluationMetadataLlmAsJudgeAdvanced table for advanced usage.

Migration is deployed at a separate time from the code. As a result, we cannot expect the code to work before the migration or after it. To address this, this part is divided in 4 PRs:

Part 2 — EvaluationMetadataLlmAsJudge and EvaluationConfiguration tables.

Here, I'll create the the EvaluationMetadataLlmAsJudge table, and one table for each EvaluationConfiguration result type. Also, modify the EvaluationDto type and EvaluationRepository to return the new type.

Deployment is split in five steps:

    1. Simpler evaluations part 2.1 #520 — Create the tables and types. However, EvaluationDto and EvaluationRepository are still the same
    1. Simpler evaluations part 2.2 #525 — Update the EvaluationDto type and EvaluationRepository to fetch data from all polymorphic relations. Additinoally, creating new evaluations (still the advanced type) will create the configuration both in its metadata table (legacy way) and its own configuration table.
    1. Simpler evaluations — part 2.3 #529 — Migrate older evaluations to use new configuration table, while still maintaining the metadata configuration.
    1. Simpler evaluations — part 2.4 #533 — Use the configuration table instead of metadata.configuration.
    1. Remove configuration field in the metadata table for the advanced evaluations.

Part 3 — New UI

Here I'll create the services and UI to create the new types of evaluations, although they won't be used in production yet.

Part 4 — Migration

Finally, swap the options to create evaluations to the new simple types.

@csansoon csansoon linked a pull request Oct 28, 2024 that will close this issue
@github-project-automation github-project-automation bot moved this from In Progress to Done in Latitude LLM Roadmap Oct 28, 2024
@csansoon csansoon reopened this Oct 28, 2024
@csansoon csansoon moved this from Done to In Progress in Latitude LLM Roadmap Oct 29, 2024
@github-project-automation github-project-automation bot moved this from In Progress to Done in Latitude LLM Roadmap Nov 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

2 participants