Evaluations – Remove conflicts in evaluation objective between configuration and prompt #420

samufyi · 2024-10-14T13:10:17Z

What?

Right now, there is a config when creating an evaluation that sets the result (numeric between 1 and 5), but we don't pass this to the prompt, so the user can set a range between 9 and 20, and it would be the one that takes into account.

In summary, there are 2 sources of truth.

https://www.figma.com/design/ODioXiqX8aeDMonsh0HBui/Latitude-Cloud?node-id=2738-34189&t=C31y3Hbykh3pzF2x-4

csansoon · 2024-10-28T13:50:27Z

The plan

To do this, the schema must change.

Now, each evaluation will have 2 polymorphic relations: metadataType and resultType

There will currently be 2 EvaluationMetadataTypes:

LlmAsJudgeAdvanced: The current metadata. It will contain the prompt and configuration json.
LlmAsJudge: This one will contain fields like objective and additionalInstructions.

And 3 EvaluationResultConfigurations, which will depend on a ResultableType:

Boolean: It contains fields like trueResultDescription and falseResultDescription
Numerical: It contains fields like minValue, minValueDescription, maxValue and maxValueDescription
Text: It contains fields like valueDescription

The evaluation will expect results depending on the resultType, and will have different behaviour depending on its type.

This allows for many more types of evaluations in the future, both llmAsJudge or any other type (like Human in the Loop), while maintaining the resultable types that we have now.

EvaluationResults will still be the same, as it still fits the use case

Development breakdown

Part 1 — `EvaluationMetadataLlmAsJudgeAdvanced`

In this first part, I'll focus on modifying and migrating to the new EvaluationMetadataLlmAsJudgeAdvanced schema.

This type does not require a resultConfiguration yet, since it is defined in the configuration json. I'll just move this json to the EvaluationMetadataLlmAsJudgeAdvanced table for advanced usage.

Migration is deployed at a separate time from the code. As a result, we cannot expect the code to work before the migration or after it. To address this, this part is divided in 4 PRs:

1. Simpler evaluations — part 1.1 #489 — Create the configuration column in evaluationMetadataLlmAsJudgeAdvanced, and adapt the code to use configuration from either (evaluation.configuration ?? evaluation.metadata.configuration).
1. Simpler evaluations - part 1.2 #513 — Modify the evaluation's create service to use the new schema.
1. Simpler evaluations part 1.3 #515 — Perform the migration to move all configuration data from evaluations to evaluationMetadataLlmAsJudgeAdvanced.
1. Simpler evaluations part 1.4 #518 — Remove the configuration field from the evaluations table.
1. Simpler evaluations part 1.5 #519 — Remove the (evaluation.configuration ?? evaluation.metadata.configuration) safenet to only use evaluation.metadata.configuration from now on.

Part 2 — `EvaluationMetadataLlmAsJudge` and `EvaluationConfiguration` tables.

Here, I'll create the the EvaluationMetadataLlmAsJudge table, and one table for each EvaluationConfiguration result type. Also, modify the EvaluationDto type and EvaluationRepository to return the new type.

Deployment is split in five steps:

1. Simpler evaluations part 2.1 #520 — Create the tables and types. However, EvaluationDto and EvaluationRepository are still the same
1. Simpler evaluations part 2.2 #525 — Update the EvaluationDto type and EvaluationRepository to fetch data from all polymorphic relations. Additinoally, creating new evaluations (still the advanced type) will create the configuration both in its metadata table (legacy way) and its own configuration table.
1. Simpler evaluations — part 2.3 #529 — Migrate older evaluations to use new configuration table, while still maintaining the metadata configuration.
1. Simpler evaluations — part 2.4 #533 — Use the configuration table instead of metadata.configuration.
1. Remove configuration field in the metadata table for the advanced evaluations.

Part 3 — New UI

Here I'll create the services and UI to create the new types of evaluations, although they won't be used in production yet.

Part 4 — Migration

Finally, swap the options to create evaluations to the new simple types.

samufyi added this to Latitude LLM Roadmap Oct 14, 2024

samufyi converted this from a draft issue Oct 14, 2024

samufyi added the p1 High priority issues label Oct 14, 2024

samufyi self-assigned this Oct 15, 2024

samufyi removed the p1 High priority issues label Oct 15, 2024

csansoon moved this from Design to Next in Latitude LLM Roadmap Oct 21, 2024

csansoon self-assigned this Oct 22, 2024

csansoon linked a pull request Oct 28, 2024 that will close this issue

Simpler evaluations — part 1.1 #489

Merged

csansoon closed this as completed in #489 Oct 28, 2024

github-project-automation bot moved this from In Progress to Done in Latitude LLM Roadmap Oct 28, 2024

csansoon reopened this Oct 28, 2024

This was referenced Oct 28, 2024

Simpler evaluations - part 1.2 #513

Merged

Simpler evaluations part 1.3 #515

Merged

csansoon moved this from Done to In Progress in Latitude LLM Roadmap Oct 29, 2024

This was referenced Oct 29, 2024

Simpler evaluations part 1.5 #519

Merged

Simpler evaluations part 2.2 #525

Merged

Simpler evaluations — part 2.4 #533

Merged

csansoon closed this as completed Nov 13, 2024

github-project-automation bot moved this from In Progress to Done in Latitude LLM Roadmap Nov 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Evaluations – Remove conflicts in evaluation objective between configuration and prompt #420

Evaluations – Remove conflicts in evaluation objective between configuration and prompt #420

samufyi commented Oct 14, 2024 •

edited by csansoon

Loading

csansoon commented Oct 28, 2024 •

edited

Loading

Evaluations – Remove conflicts in evaluation objective between configuration and prompt #420

Evaluations – Remove conflicts in evaluation objective between configuration and prompt #420

Comments

samufyi commented Oct 14, 2024 • edited by csansoon Loading

csansoon commented Oct 28, 2024 • edited Loading

The plan

Development breakdown

Part 1 — EvaluationMetadataLlmAsJudgeAdvanced

Part 2 — EvaluationMetadataLlmAsJudge and EvaluationConfiguration tables.

Part 3 — New UI

Part 4 — Migration

samufyi commented Oct 14, 2024 •

edited by csansoon

Loading

csansoon commented Oct 28, 2024 •

edited

Loading

Part 1 — `EvaluationMetadataLlmAsJudgeAdvanced`

Part 2 — `EvaluationMetadataLlmAsJudge` and `EvaluationConfiguration` tables.