-
Notifications
You must be signed in to change notification settings - Fork 649
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
test: introduce nightly job for robustness test #658
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM thanks @fuweid. Two suggestions that aren't blocking.
with: | ||
count: 100 | ||
testTimeout: 200m | ||
runs-on: "['ubuntu-latest-8-cores']" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add an instance of this robustness job on ['actuated-arm64-8cpu-8gb']
?
Happy for it to be done as a follow-up issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. Let me handle this. It might require to enable some kernel modules.
@@ -84,4 +84,4 @@ test-failpoint: | |||
.PHONY: test-robustness # Running robustness tests requires root permission | |||
test-robustness: | |||
go test -v ${TESTFLAGS} ./tests/dmflakey -test.root | |||
go test -v ${TESTFLAGS} ./tests/robustness -test.root | |||
go test -v ${TESTFLAGS} ${ROBUSTNESS_TESTFLAGS} ./tests/robustness -test.root |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I notice we don't have any graceful validation that whoever runs make test-robustness
has actually run the pre-requisite make gofail-enable
prior. Currently we just get cryptic error printed which folks might not understand at first glance, i.e:
=== RUN TestRestartFromPowerFailure/fp_ext4_commit5s
powerfailure_test.go:112: start bbolt bench -work -path /tmp/TestRestartFromPowerFailurefp_ext4_commit5s797232614/002/boltdb -count=1000000000 -batch-size=5
powerfailure_test.go:129: simulate power failure
powerfailure_test.go:134: random pick failpoint: resizeFileError
powerfailure_test.go:168:
Error Trace: /home/james/Documents/bbolt/tests/robustness/powerfailure_test.go:168
/home/james/Documents/bbolt/tests/robustness/powerfailure_test.go:135
/home/james/Documents/bbolt/tests/robustness/powerfailure_test.go:81
Error: Received unexpected error:
Put "http://127.0.0.1:12345/resizeFileError": dial tcp 127.0.0.1:12345: connect: connection refused
In etcd-io/etcd
we introduced https://github.com/etcd-io/etcd/blob/f8d5ba9a3fdff4f706f06ef0f0c0a27bb033ac53/tests/framework/integration/testing.go#L102. Perhaps we should introduce something similar here, not a blocker on this pr though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I build bbolt with failpoints. I think gofail.List doesn't work for this case. Let me think about how to handle.
Update case with a combination of EXT4 filesystem's commit setting and unexpected exit event. That EXT4 filesystem's commit is to sync all its data and metadata every seconds. The kernel can help us sync even if that process has been killed. With different commit setting, we can simulate that case that kernel syncs half part of dirty pages before power failure. And for unexpected exit event, we can kill that process randomly or panic at failpoint instead of fixed code path. Signed-off-by: Wei Fu <[email protected]>
Signed-off-by: Wei Fu <[email protected]>
40692d2
to
c61a3be
Compare
Overall looks good to me. Please let me know if you want to resolve the comments in a separate PR, then I can merge this PR |
Hi @ahrtr , let me resolve those comments in a follow-up. Thanks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
Thanks
Update case with a combination of EXT4 filesystem's commit setting and unexpected exit event. That EXT4 filesystem's commit is to sync all its data and metadata every seconds. The kernel can help us sync even if that process has been killed. With different commit setting, we can simulate that case that kernel syncs half part of dirty pages before power
failure. And for unexpected exit event, we can kill that process randomly or panic at failpoint instead of fixed code path.
Update GITHUB CI workflow for nightly run. Nightly run job takes 1h and is be verified in my fork https://github.com/fuweid/bbolt/actions/runs/7378167227/job/20072981571