Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scoring code needs to account for held-out workload performance #588

Closed
fsschneider opened this issue Nov 23, 2023 · 2 comments
Closed

Scoring code needs to account for held-out workload performance #588

fsschneider opened this issue Nov 23, 2023 · 2 comments

Comments

@fsschneider
Copy link
Contributor

The benchmark rules describe that submissions also need to perform well on held-out workloads. This is currently not automatically accounted for in the scoring code.
Specifically, (quoting from the benchmark rules):
For a submission to receive a finite training time on a fixed workload, it needs to:

  • Reach the validation target on the fixed workload within the maximum runtime.
  • Reach the validation target fixed workload within 4x of the fastest submission.
  • Reach the validation target on the held-out workload (corresponding to the fixed workload) within the maximum runtime.
  • Reach the validation target on the held-out workload (corresponding to the fixed workload) within 4x of the fastest submission. To determine the fastest submission on a held-out workload, we only consider submissions that reached the target on the corresponding fixed workload. This protects us against extremely fast submissions that only work on a specific held-out workload and are useless as general algorithms.
@priyakasimbeg priyakasimbeg mentioned this issue Jan 25, 2024
4 tasks
@priyakasimbeg
Copy link
Contributor

Remaining work on this is to add tests that check if all 4 criteria are enforced in strict mode.

@priyakasimbeg
Copy link
Contributor

Moved tracking of tests to #624

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants