Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fail faster on errors reading CockroachDB listening-url file #2091

Merged
merged 3 commits into from
Dec 22, 2022

Conversation

davepacheco
Copy link
Collaborator

@davepacheco davepacheco commented Dec 22, 2022

When starting CockroachDB either as part of the test suite or omicron-dev db-run, we wait for it to write out its listening URL to a known file path. This is asynchronous. It's the way that we determine when CockroachDB is up. So we just have to poll on it. Naturally, if it takes too long, we give up and fail. In the meantime, if the file doesn't exist or exists but is incomplete, then we want just keep waiting.

The code today continues waiting if it runs into any error reading the file, not just ENOENT. So for example while I was debugging #1146, I put a cockroach executable in my PATH that ran the requested command under pfexec dtrace -c, which meant I was running it as root, but the test suite was still running as my normal user. As a result of either CockroachDB's explicit choice or else the umask in effect at the time, the listen URL file was created with permissions that prevented the test suite from reading it. Instead of failing immediately and telling me that, it waited for the full timeout period (30 seconds) and then just reported a generic timeout error (saying that CockroachDB hadn't started within the timeout, which wasn't actually true).

This PR changes the error handling here in the specific case that we get an error other than ENOENT when trying to read the file. In that case, we fail immediately (rather than waiting for the full timeout) and actually print the error.

I couldn't think of a good way to test this automatically. I tested it by hand by replicating the problem I had above. It now fails quickly and prints:

thread 'dev::db::test::test_setup_database_default_dir' panicked at 'failed to start database: Unknown { source: checking listen file "/dangerzone/omicron_tmp/.tmplraWmB/listen-url"

Caused by:
    Permission denied (os error 13) }', test-utils/src/dev/db.rs:1247:35

full output:

$ /home/dap/omicron-iter/target/debug/deps/omicron_test_utils-1a64da54be1ee8cc -- dev::db::test::test_setup_database_default_dir

running 1 test
test dev::db::test::test_setup_database_default_dir ... ok

test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 14 filtered out; finished in 26.44s

dap@ivanova omicron-iter $ PATH=$PWD/debug-bin:$PATH /home/dap/omicron-iter/target/debug/deps/omicron_test_utils-1a64da54be1ee8cc -- dev::db::test::test_setup_database_default_dir

running 1 test
test dev::db::test::test_setup_database_default_dir ... FAILED

failures:

---- dev::db::test::test_setup_database_default_dir stdout ----
will run: cockroach start-single-node --insecure --http-addr=:0 --store=path=/dangerzone/omicron_tmp/.tmplraWmB/data,ballast-size=0 --listen-addr 127.0.0.1:0 --listening-url-file /dangerzone/omicron_tmp/.tmplraWmB/listen-url
environment:
    BASH_SILENCE_DEPRECATION_WARNING=1
    BAT_THEME=ansi
    BUNYAN_NO_PAGER=1
    CSCOPEOPTIONS=-r -p8
    EDITOR=vim
    GOTRACEBACK=crash
    HOME=/home/dap
    HOST=ivanova
    LANG=en_US.UTF-8
    LESS=-P ?f%f .line %lb/%L .byte %bB?s/%s. ?e(END):?pB%pB\%..%t
    LOGNAME=dap
    MACH=i386
    MACHINE_THAT_GOES_PING=1
    OLDPWD=/home/dap
    PAGER=less -X
    PATH=/home/dap/omicron-iter/debug-bin:/home/dap/omicron-cockroachdb/maybefixed-bin/bin:/home/dap/.cargo/bin::/home/dap/bin:/home/dap/install/bin:/bin:/sbin:/usr/bin:/usr/sbin:/usr/gnu/bin:/opt/ooce/bin:/home/dap/tools/cockroachdb/bin:/home/dap/tools/clickhouse
    PS1=\[\033]0;\w (\h)\007\]\u@ivanova \[\]\W\[\] $ 
    PWD=/home/dap/omicron-iter
    RAWPS1=\u@ivanova \[\]\W\[\] $ 
    SHELL=/bin/bash
    SHLVL=1
    SSH_AUTH_SOCK=/tmp/ssh-XXXX_saivR/agent.22185
    SSH_CLIENT=172.20.16.62 50048 22
    SSH_CONNECTION=172.20.16.62 50048 172.20.2.70 22
    SSH_TTY=/dev/pts/2
    TERM=xterm-256color
    TMPDIR=/dangerzone/omicron_tmp
    TZ=US/Pacific
    USER=dap
    VISUAL=vim
    _=/home/dap/omicron-iter/target/debug/deps/omicron_test_utils-1a64da54be1ee8cc
thread 'dev::db::test::test_setup_database_default_dir' panicked at 'failed to start database: Unknown { source: checking listen file "/dangerzone/omicron_tmp/.tmplraWmB/listen-url"

Caused by:
    Permission denied (os error 13) }', test-utils/src/dev/db.rs:1247:35
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace


failures:
    dev::db::test::test_setup_database_default_dir

test result: FAILED. 0 passed; 1 failed; 0 ignored; 0 measured; 14 filtered out; finished in 1.95s

Update: I added an automated test, too.

Err(error) => {
let maybe_errno = error.raw_os_error();
if maybe_errno.is_some()
&& maybe_errno.unwrap() == libc::ENOENT
Copy link
Contributor

@jgallagher jgallagher Dec 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you don't like the is_some/unwrap here, I think it'd be equivalent to check error.kind() == io::ErrorKind::NotFound: https://github.com/rust-lang/rust/blob/b70baa4f922a1809d79caeaeb902800c3be283b9/library/std/src/sys/unix/mod.rs#L252

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I think that will make this much cleaner. Fixing in 546b63c.

@davepacheco davepacheco marked this pull request as ready for review December 22, 2022 22:13
@davepacheco
Copy link
Collaborator Author

Thanks for the review!

@davepacheco davepacheco enabled auto-merge (squash) December 22, 2022 22:25
@davepacheco davepacheco merged commit 6234c66 into main Dec 22, 2022
@davepacheco davepacheco deleted the test-error-eaccess branch December 22, 2022 22:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants