Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test: introduce nightly job for robustness test #658

Merged
merged 2 commits into from
Jan 3, 2024

Conversation

fuweid
Copy link
Member

@fuweid fuweid commented Jan 1, 2024

  • tests: Update TestRestartFromPowerFailure

Update case with a combination of EXT4 filesystem's commit setting and unexpected exit event. That EXT4 filesystem's commit is to sync all its data and metadata every seconds. The kernel can help us sync even if that process has been killed. With different commit setting, we can simulate that case that kernel syncs half part of dirty pages before power
failure. And for unexpected exit event, we can kill that process randomly or panic at failpoint instead of fixed code path.

  • *: introduce nightly run for robustness test

Update GITHUB CI workflow for nightly run. Nightly run job takes 1h and is be verified in my fork https://github.com/fuweid/bbolt/actions/runs/7378167227/job/20072981571

@fuweid fuweid changed the title Introduce nightly run test: introduce nightly job for robustness test Jan 1, 2024
@fuweid fuweid marked this pull request as ready for review January 2, 2024 01:29
Copy link
Member

@jmhbnz jmhbnz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM thanks @fuweid. Two suggestions that aren't blocking.

with:
count: 100
testTimeout: 200m
runs-on: "['ubuntu-latest-8-cores']"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add an instance of this robustness job on ['actuated-arm64-8cpu-8gb']?

Happy for it to be done as a follow-up issue.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. Let me handle this. It might require to enable some kernel modules.

@@ -84,4 +84,4 @@ test-failpoint:
.PHONY: test-robustness # Running robustness tests requires root permission
test-robustness:
go test -v ${TESTFLAGS} ./tests/dmflakey -test.root
go test -v ${TESTFLAGS} ./tests/robustness -test.root
go test -v ${TESTFLAGS} ${ROBUSTNESS_TESTFLAGS} ./tests/robustness -test.root
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I notice we don't have any graceful validation that whoever runs make test-robustness has actually run the pre-requisite make gofail-enable prior. Currently we just get cryptic error printed which folks might not understand at first glance, i.e:

=== RUN   TestRestartFromPowerFailure/fp_ext4_commit5s                                         
    powerfailure_test.go:112: start bbolt bench -work -path /tmp/TestRestartFromPowerFailurefp_ext4_commit5s797232614/002/boltdb -count=1000000000 -batch-size=5
    powerfailure_test.go:129: simulate power failure                                           
    powerfailure_test.go:134: random pick failpoint: resizeFileError                           
    powerfailure_test.go:168:                                                                  
                Error Trace:    /home/james/Documents/bbolt/tests/robustness/powerfailure_test.go:168
                                                        /home/james/Documents/bbolt/tests/robustness/powerfailure_test.go:135
                                                        /home/james/Documents/bbolt/tests/robustness/powerfailure_test.go:81
                Error:          Received unexpected error:                                     
                                Put "http://127.0.0.1:12345/resizeFileError": dial tcp 127.0.0.1:12345: connect: connection refused

In etcd-io/etcd we introduced https://github.com/etcd-io/etcd/blob/f8d5ba9a3fdff4f706f06ef0f0c0a27bb033ac53/tests/framework/integration/testing.go#L102. Perhaps we should introduce something similar here, not a blocker on this pr though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I build bbolt with failpoints. I think gofail.List doesn't work for this case. Let me think about how to handle.

fuweid added 2 commits January 2, 2024 21:29
Update case with a combination of EXT4 filesystem's commit setting and
unexpected exit event. That EXT4 filesystem's commit is to sync all its data
and metadata every seconds. The kernel can help us sync even if that
process has been killed. With different commit setting, we can simulate
that case that kernel syncs half part of dirty pages before power
failure. And for unexpected exit event, we can kill that process
randomly or panic at failpoint instead of fixed code path.

Signed-off-by: Wei Fu <[email protected]>
@fuweid fuweid force-pushed the introduce-nightly-run branch from 40692d2 to c61a3be Compare January 2, 2024 13:30
@ahrtr
Copy link
Member

ahrtr commented Jan 2, 2024

Overall looks good to me. Please let me know if you want to resolve the comments in a separate PR, then I can merge this PR

@fuweid
Copy link
Member Author

fuweid commented Jan 3, 2024

Hi @ahrtr , let me resolve those comments in a follow-up. Thanks

Copy link
Member

@ahrtr ahrtr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Thanks

@ahrtr ahrtr merged commit 1d7fd9a into etcd-io:main Jan 3, 2024
15 checks passed
@fuweid fuweid deleted the introduce-nightly-run branch January 3, 2024 08:23
@fuweid
Copy link
Member Author

fuweid commented Jan 3, 2024

It works https://github.com/etcd-io/bbolt/actions/workflows/robustness_nightly.yaml

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

4 participants