Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Experiment Reports and Level 1 Assertions @ Runtime #300

Merged
merged 7 commits into from
Oct 15, 2024
Merged

Experiment Reports and Level 1 Assertions @ Runtime #300

merged 7 commits into from
Oct 15, 2024

Conversation

jlewi
Copy link
Owner

@jlewi jlewi commented Oct 15, 2024

Experiment Report

After running an evaluation experiment, we compute a report that contains the key metrics we want to track. To start with this is

  • Number of cell match results
  • Number of errors and examples
  • Generate latency measured as percentiles
  • Level1 assertion stats

Level 1 Assertion stats

Reintegrate Level 1 Assertions Into Evaluation

  • Fix Integrate Level 1 Assertions Into Evaluator #261
  • We start computing level 1 assertions at RunTime so that they are available in production and evaluation
  • Level1 assertions are computed and then logged
  • Our Analyzer pipeline reads the assertions from the logs and adds them to the trace
  • Our evaluation report accumulates assertion statistics and reports them

Copy link

netlify bot commented Oct 15, 2024

Deploy Preview for foyle canceled.

Name Link
🔨 Latest commit a2a6cd1
🔍 Latest deploy log https://app.netlify.com/sites/foyle/deploys/670e9c4669bedc0008f02bdd

@jlewi jlewi enabled auto-merge (squash) October 15, 2024 16:41
@jlewi jlewi merged commit 43f746c into main Oct 15, 2024
5 checks passed
@jlewi jlewi deleted the jlewi/lvl1 branch October 15, 2024 16:48
jlewi added a commit that referenced this pull request Oct 16, 2024
#304)

This code should have been included in #300 but it was left out of that
PR.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Integrate Level 1 Assertions Into Evaluator
1 participant