-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add proof benchmarking in CI pipeline #651
Comments
I'd love to see durable timings so that I can get a historical view. Bonus points for repeated runs every night so I can get some error bars. GH CI is the worst case for getting these metrics - noisy shared machines are going to get us wide error margins :D |
@0xaatif Yeah, that is a good idea, I was already thinking to add dedicated benchmark workflow scheduled every night. Current implementation is a quick check to make regressions obvious with the current PR development (we will see about error margin and tweak accordingly) - reason I have implemented it here is because it was easy and it does not add any overhead. Now, how to keep historical workflow data in the CI? |
I think we need a stronger approach than the current. Few points:
I wouldn't know about historical workflow data, but IIRC in the past winterfell did have some benchmarking job in the CI. Maybe @bobbinth would know if there were some issues with it |
@Nashtare Ok, then I'll close this PR and make a dedicated benchmark workflow.
|
What if we use a smaller security parameter? (@sai-deng mentioned this in another context). It would be cheaper while, at least at first sight, it would be weird to get a performance regression for sec param |
Fair point, though we'd have trouble setting a meaningful threshold. |
I think we should use production security parameters for benchmarking. Additionally, using cycle counts instead of CPU time may provide more stable and consistent results. |
Did you mean our zkCPU cycle count? This is a good metric but isn't sufficient. For instance continuations brought a huge slowdown in CPU frequency because of memory flashing, but this would appear with just cycle count. Also all changes on the circuitry wouldn't be reflected. |
I mean the CPU cycle count on the benchmark machine. |
@sai-deng Cycle count for an hour or two proving job would be too big to practically handle and would not be easy to compare, no? I mean if job is 75 minutes and after change it is 80 minutes, you get some idea, but if second job is 12345778910111213 cycles slower you have no idea if it is significant or not. |
I agree that handling large cycle counts can be tricky, we can record both CPU time and cycle counts, with the focus on percentage changes rather than absolute values. I’ve also used |
@sai-deng Seems
|
Yes, we may need a self-hosted runner for cycle count. |
Yes, Winterfell had it in the past - but I think our approach was relatively simplistic and it added a lot of overhead for relatively little benefit. So, IIRC, rather than investing time to make it more sophisticated, we turned it off in favor of manual benchmarking. |
We've noticed lately some performance hit due to a newly added feature, that highlights a more general issue of missing performance-related regression checks. There's some delta in proof generation timings, but noticeable regressions could still be easily caught.
We should set up some CI job for this.
The text was updated successfully, but these errors were encountered: