Add proof benchmarking in CI pipeline #651

Nashtare · 2024-09-23T19:42:39Z

We've noticed lately some performance hit due to a newly added feature, that highlights a more general issue of missing performance-related regression checks. There's some delta in proof generation timings, but noticeable regressions could still be easily caught.

We should set up some CI job for this.

0xaatif · 2024-10-03T11:59:01Z

I'd love to see durable timings so that I can get a historical view.

Bonus points for repeated runs every night so I can get some error bars.

GH CI is the worst case for getting these metrics - noisy shared machines are going to get us wide error margins :D

atanmarko · 2024-10-03T12:04:44Z

@0xaatif Yeah, that is a good idea, I was already thinking to add dedicated benchmark workflow scheduled every night. Current implementation is a quick check to make regressions obvious with the current PR development (we will see about error margin and tweak accordingly) - reason I have implemented it here is because it was easy and it does not add any overhead.

Now, how to keep historical workflow data in the CI?

Nashtare · 2024-10-03T12:09:30Z

I think we need a stronger approach than the current. Few points:

@0xaatif mentioned the variability around shared instances, we'd need our own runner
current parameters (in feat: add ci proving time benchmark #694) are noisy (we need reproducible runs with as little as possible room for noise, this includes for instance prover state fully stored in memory to not have any disk I/O delays, ...)
relevant test case: the 2 blocks used in the script are very special edge cases (one with barely empty blocks, one with a single block). Initially the request for such feature was because of a perf regression observed over batches larger than the block pool size for instance.

I wouldn't know about historical workflow data, but IIRC in the past winterfell did have some benchmarking job in the CI. Maybe @bobbinth would know if there were some issues with it

atanmarko · 2024-10-03T12:14:55Z

@Nashtare Ok, then I'll close this PR and make a dedicated benchmark workflow.

I suppose it should only run on scheduled time (night EU/US) not to interfere with the others. Maybe we could run it on the PR merge.
Dedicated machine (how different should it be form the current zero-ci machine)?
What data should be collected in the benchmark?
What are the representative blocks that should be used in the benchmark?
How the data should be kept? Woud you just go through the github action history for that workflow or should we upload data to some database?

4l0n50 · 2024-10-03T12:45:24Z

What if we use a smaller security parameter? (@sai-deng mentioned this in another context). It would be cheaper while, at least at first sight, it would be weird to get a performance regression for sec param s but not for s' < s.

Nashtare · 2024-10-03T12:50:58Z

Fair point, though we'd have trouble setting a meaningful threshold.
Realistically speaking, if we have some regression on the pure proving side (stark or recursion), that incurs an overhead of 7% in total, bringing the security level down from 100 bits to 30 could bring this overhead down to 2% (which is close to the noise threshold)

sai-deng · 2024-10-03T16:24:27Z

I think we should use production security parameters for benchmarking. Additionally, using cycle counts instead of CPU time may provide more stable and consistent results.

Nashtare · 2024-10-03T16:34:27Z

Additionally, using cycle counts instead of CPU time may provide more stable and consistent results.

Did you mean our zkCPU cycle count? This is a good metric but isn't sufficient. For instance continuations brought a huge slowdown in CPU frequency because of memory flashing, but this would appear with just cycle count. Also all changes on the circuitry wouldn't be reflected.

sai-deng · 2024-10-03T16:38:36Z

I mean the CPU cycle count on the benchmark machine.

atanmarko · 2024-10-03T21:21:50Z

@sai-deng Cycle count for an hour or two proving job would be too big to practically handle and would not be easy to compare, no? I mean if job is 75 minutes and after change it is 80 minutes, you get some idea, but if second job is 12345778910111213 cycles slower you have no idea if it is significant or not.

sai-deng · 2024-10-03T22:14:32Z

I agree that handling large cycle counts can be tricky, we can record both CPU time and cycle counts, with the focus on percentage changes rather than absolute values. I’ve also used perf stat in previous projects for jobs lasting an hour without issues.

atanmarko · 2024-10-04T13:53:28Z

@sai-deng Seems perf stat -e cycles is not supported on the CI machine:

Performance counter stats for '/home/runner/work/zk_evm/zk_evm/target/release/leader --runtime in-memory --load-strategy monolithic --block-batch-size 8 --proof-output-dir /home/runner/work/zk_evm/zk_evm/proofs stdio':

   <not supported>      cycles                                                                

     280.332879270 seconds time elapsed

    6285.283435000 seconds user
     276.485728000 seconds sys

sai-deng · 2024-10-04T17:01:51Z

Yes, we may need a self-hosted runner for cycle count.

bobbinth · 2024-10-04T19:28:37Z

IIRC in the past winterfell did have some benchmarking job in the CI. Maybe @bobbinth would know if there were some issues with it

Yes, Winterfell had it in the past - but I think our approach was relatively simplistic and it added a lot of overhead for relatively little benefit. So, IIRC, rather than investing time to make it more sophisticated, we turned it off in favor of manual benchmarking.

Nashtare added this to the Performance Tuning milestone Sep 23, 2024

Nashtare added the ci label Sep 24, 2024

atanmarko self-assigned this Sep 26, 2024

atanmarko mentioned this issue Oct 3, 2024

feat: add ci proving time benchmark #694

Closed

atanmarko linked a pull request Oct 4, 2024 that will close this issue

feat: github proving benchmark #701

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add proof benchmarking in CI pipeline #651

Add proof benchmarking in CI pipeline #651

Nashtare commented Sep 23, 2024

0xaatif commented Oct 3, 2024

atanmarko commented Oct 3, 2024 •

edited

Loading

Nashtare commented Oct 3, 2024 •

edited

Loading

atanmarko commented Oct 3, 2024 •

edited

Loading

4l0n50 commented Oct 3, 2024

Nashtare commented Oct 3, 2024 •

edited

Loading

sai-deng commented Oct 3, 2024

Nashtare commented Oct 3, 2024

sai-deng commented Oct 3, 2024

atanmarko commented Oct 3, 2024 •

edited

Loading

sai-deng commented Oct 3, 2024

atanmarko commented Oct 4, 2024

sai-deng commented Oct 4, 2024

bobbinth commented Oct 4, 2024

Add proof benchmarking in CI pipeline #651

Add proof benchmarking in CI pipeline #651

Comments

Nashtare commented Sep 23, 2024

0xaatif commented Oct 3, 2024

atanmarko commented Oct 3, 2024 • edited Loading

Nashtare commented Oct 3, 2024 • edited Loading

atanmarko commented Oct 3, 2024 • edited Loading

4l0n50 commented Oct 3, 2024

Nashtare commented Oct 3, 2024 • edited Loading

sai-deng commented Oct 3, 2024

Nashtare commented Oct 3, 2024

sai-deng commented Oct 3, 2024

atanmarko commented Oct 3, 2024 • edited Loading

sai-deng commented Oct 3, 2024

atanmarko commented Oct 4, 2024

sai-deng commented Oct 4, 2024

bobbinth commented Oct 4, 2024

atanmarko commented Oct 3, 2024 •

edited

Loading

Nashtare commented Oct 3, 2024 •

edited

Loading

atanmarko commented Oct 3, 2024 •

edited

Loading

Nashtare commented Oct 3, 2024 •

edited

Loading

atanmarko commented Oct 3, 2024 •

edited

Loading