Create performance benchmarks for key `pgroll` features #408

andrew-farries · 2024-10-16T12:36:52Z

Gather benchmarks data for the following parts of pgroll:

Backfill duration - How long does it take to perform backfill operations on a table of some fixed size (say 10^7 rows)?
Effect of dual writes - What overhead do the up/down triggers incur on UPDATE heavy tables?
read_schema query performance - Benchmark the performance of the read_schema query, run on every DDL statement to capture 'inferred' migrations.
Persist results somewhere so we can see changes over time

Having these benchmarks in place would allow us to measure performance improvements over time and avoid regressions.

The text was updated successfully, but these errors were encountered:

ryanslade · 2024-10-17T09:10:10Z

I'd like to have a go at this.

In a perfect world we'd probably want to run these against every commit, but I imagine they may take a while to run and I don't want to affect the velocity of getting things into main. Maybe a compromise is that we spin up an environment once a day and run the tests against all new commits?

Apart from actually writing the benchmarks, we need to decide on a few things:

How often do we run them? I suggest once a day as mentioned above
Where do we run them? I think we may want to spin up a dedicated environment in EC2 so that the results are consistent
Where do we store results? Ideally, since this is an open source project, we may want the results to be public. Perhaps we can upload results to a wiki / docs area in this repo?

Anything else?

andrew-farries · 2024-10-17T09:18:55Z

I think what you suggest is a good start. We want the benchmarks for a couple of reasons:

Guard against performance regressions
Have benchmarks available as part of the public documentation for the repository.

I suggest running the benchmarks as a separate workflow that is automatically run on changes to main and that can also be invoked manually on branches.

A consistent environment in terms of hardware and probably also software (maybe run the benchmarks in a container) is a must too.

Results could be uploaded to object storage and pulled from there into our docs.

This change adds a benchmark that run against 10k, 100k and 1 million rows. They benchmark: * How long it takes to complete a full back fill of a single column * How long it takes to update all rows in a table with and without a migration trigger in place This should give us a baseline metric that we can use to compare performance over time. Example output: ``` make bench go test ./internal/benchmarks -v -benchtime=1x -bench . 2024/10/21 12:44:01 github.com/testcontainers/testcontainers-go - Connected to docker: Server Version: 27.2.0 API Version: 1.46 Operating System: Docker Desktop Total Memory: 7838 MB Labels: com.docker.desktop.address=unix:///Users/ryan/Library/Containers/com.docker.docker/Data/docker-cli.sock Testcontainers for Go Version: v0.33.0 Resolved Docker Host: unix:///var/run/docker.sock Resolved Docker Socket Path: /var/run/docker.sock Test SessionID: 816adaef777204b01d23a061c6f5532ca8cea098c7f8c6a68fdf542fbfa73f6e Test ProcessID: bf2f6095-b21e-4569-a4df-52291606bf3d 2024/10/21 12:44:01 🐳 Creating container for image testcontainers/ryuk:0.8.1 2024/10/21 12:44:01 ✅ Container created: eab8b6af62ba 2024/10/21 12:44:01 🐳 Starting container: eab8b6af62ba 2024/10/21 12:44:01 ✅ Container started: eab8b6af62ba 2024/10/21 12:44:01 ⏳ Waiting for container id eab8b6af62ba image: testcontainers/ryuk:0.8.1. Waiting for: &{Port:8080/tcp timeout:<nil> PollInterval:100ms skipInternalCheck:false} 2024/10/21 12:44:01 🔔 Container is ready: eab8b6af62ba 2024/10/21 12:44:01 🐳 Creating container for image postgres:15.3 2024/10/21 12:44:01 ✅ Container created: 7bc6dfd7af00 2024/10/21 12:44:01 🐳 Starting container: 7bc6dfd7af00 2024/10/21 12:44:01 ✅ Container started: 7bc6dfd7af00 2024/10/21 12:44:01 ⏳ Waiting for container id 7bc6dfd7af00 image: postgres:15.3. Waiting for: &{timeout:<nil> deadline:0x14000435060 Strategies:[0x14000460540]} 2024/10/21 12:44:02 🔔 Container is ready: 7bc6dfd7af00 goos: darwin goarch: arm64 pkg: github.com/xataio/pgroll/internal/benchmarks cpu: Apple M2 Pro BenchmarkBackfill BenchmarkBackfill/10000 benchmarks_test.go:136: Seeded 10000 rows in 19.073458ms (524289 rows/s) benchmarks_test.go:51: Backfilled 10000 rows in 102.083958ms BenchmarkBackfill/10000-10 1 102083958 ns/op 97959 rows/s BenchmarkBackfill/100000 benchmarks_test.go:136: Seeded 100000 rows in 96.639042ms (1034778 rows/s) benchmarks_test.go:51: Backfilled 100000 rows in 2.032871959s BenchmarkBackfill/100000-10 1 2032871959 ns/op 49191 rows/s BenchmarkBackfill/1000000 benchmarks_test.go:136: Seeded 1000000 rows in 608.590708ms (1643140 rows/s) benchmarks_test.go:51: Backfilled 1000000 rows in 56.80506s BenchmarkBackfill/1000000-10 1 56805060000 ns/op 17604 rows/s BenchmarkWriteAmplification BenchmarkWriteAmplification/NoTrigger BenchmarkWriteAmplification/NoTrigger/10000 benchmarks_test.go:136: Seeded 10000 rows in 21.901875ms (456582 rows/s) BenchmarkWriteAmplification/NoTrigger/10000-10 1 15013333 ns/op 666075 rows/s BenchmarkWriteAmplification/NoTrigger/100000 benchmarks_test.go:136: Seeded 100000 rows in 98.442458ms (1015822 rows/s) BenchmarkWriteAmplification/NoTrigger/100000-10 1 155141667 ns/op 644572 rows/s BenchmarkWriteAmplification/NoTrigger/1000000 benchmarks_test.go:136: Seeded 1000000 rows in 663.248542ms (1507730 rows/s) BenchmarkWriteAmplification/NoTrigger/1000000-10 1 1704721875 ns/op 586606 rows/s BenchmarkWriteAmplification/WithTrigger BenchmarkWriteAmplification/WithTrigger/10000 benchmarks_test.go:136: Seeded 10000 rows in 26.146708ms (382457 rows/s) BenchmarkWriteAmplification/WithTrigger/10000-10 1 59703417 ns/op 167495 rows/s BenchmarkWriteAmplification/WithTrigger/100000 benchmarks_test.go:136: Seeded 100000 rows in 102.552667ms (975109 rows/s) BenchmarkWriteAmplification/WithTrigger/100000-10 1 630408666 ns/op 158627 rows/s BenchmarkWriteAmplification/WithTrigger/1000000 benchmarks_test.go:136: Seeded 1000000 rows in 666.005167ms (1501490 rows/s) BenchmarkWriteAmplification/WithTrigger/1000000-10 1 5909246000 ns/op 169226 rows/s PASS 2024/10/21 12:45:51 🐳 Terminating container: 7bc6dfd7af00 2024/10/21 12:45:51 🚫 Container terminated: 7bc6dfd7af00 ok github.com/xataio/pgroll/internal/benchmarks 110.632s ``` Part of #408

andrew-farries added this to the v1 milestone Oct 16, 2024

ryanslade self-assigned this Oct 17, 2024

ryanslade mentioned this issue Oct 17, 2024

Add backfill benchmarks #412

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create performance benchmarks for key `pgroll` features #408

Create performance benchmarks for key `pgroll` features #408

andrew-farries commented Oct 16, 2024 •

edited by ryanslade

Loading

ryanslade commented Oct 17, 2024

andrew-farries commented Oct 17, 2024

Create performance benchmarks for key pgroll features #408

Create performance benchmarks for key pgroll features #408

Comments

andrew-farries commented Oct 16, 2024 • edited by ryanslade Loading

ryanslade commented Oct 17, 2024

andrew-farries commented Oct 17, 2024

Create performance benchmarks for key `pgroll` features #408

Create performance benchmarks for key `pgroll` features #408

andrew-farries commented Oct 16, 2024 •

edited by ryanslade

Loading