fail faster on errors reading CockroachDB listening-url file #2091

davepacheco · 2022-12-22T20:01:22Z

When starting CockroachDB either as part of the test suite or omicron-dev db-run, we wait for it to write out its listening URL to a known file path. This is asynchronous. It's the way that we determine when CockroachDB is up. So we just have to poll on it. Naturally, if it takes too long, we give up and fail. In the meantime, if the file doesn't exist or exists but is incomplete, then we want just keep waiting.

The code today continues waiting if it runs into any error reading the file, not just ENOENT. So for example while I was debugging #1146, I put a cockroach executable in my PATH that ran the requested command under pfexec dtrace -c, which meant I was running it as root, but the test suite was still running as my normal user. As a result of either CockroachDB's explicit choice or else the umask in effect at the time, the listen URL file was created with permissions that prevented the test suite from reading it. Instead of failing immediately and telling me that, it waited for the full timeout period (30 seconds) and then just reported a generic timeout error (saying that CockroachDB hadn't started within the timeout, which wasn't actually true).

This PR changes the error handling here in the specific case that we get an error other than ENOENT when trying to read the file. In that case, we fail immediately (rather than waiting for the full timeout) and actually print the error.

~~I couldn't think of a good way to test this automatically.~~ I tested it by hand by replicating the problem I had above. It now fails quickly and prints:

thread 'dev::db::test::test_setup_database_default_dir' panicked at 'failed to start database: Unknown { source: checking listen file "/dangerzone/omicron_tmp/.tmplraWmB/listen-url"

Caused by:
    Permission denied (os error 13) }', test-utils/src/dev/db.rs:1247:35

full output:

$ /home/dap/omicron-iter/target/debug/deps/omicron_test_utils-1a64da54be1ee8cc -- dev::db::test::test_setup_database_default_dir

running 1 test
test dev::db::test::test_setup_database_default_dir ... ok

test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 14 filtered out; finished in 26.44s

dap@ivanova omicron-iter $ PATH=$PWD/debug-bin:$PATH /home/dap/omicron-iter/target/debug/deps/omicron_test_utils-1a64da54be1ee8cc -- dev::db::test::test_setup_database_default_dir

running 1 test
test dev::db::test::test_setup_database_default_dir ... FAILED

failures:

---- dev::db::test::test_setup_database_default_dir stdout ----
will run: cockroach start-single-node --insecure --http-addr=:0 --store=path=/dangerzone/omicron_tmp/.tmplraWmB/data,ballast-size=0 --listen-addr 127.0.0.1:0 --listening-url-file /dangerzone/omicron_tmp/.tmplraWmB/listen-url
environment:
    BASH_SILENCE_DEPRECATION_WARNING=1
    BAT_THEME=ansi
    BUNYAN_NO_PAGER=1
    CSCOPEOPTIONS=-r -p8
    EDITOR=vim
    GOTRACEBACK=crash
    HOME=/home/dap
    HOST=ivanova
    LANG=en_US.UTF-8
    LESS=-P ?f%f .line %lb/%L .byte %bB?s/%s. ?e(END):?pB%pB\%..%t
    LOGNAME=dap
    MACH=i386
    MACHINE_THAT_GOES_PING=1
    OLDPWD=/home/dap
    PAGER=less -X
    PATH=/home/dap/omicron-iter/debug-bin:/home/dap/omicron-cockroachdb/maybefixed-bin/bin:/home/dap/.cargo/bin::/home/dap/bin:/home/dap/install/bin:/bin:/sbin:/usr/bin:/usr/sbin:/usr/gnu/bin:/opt/ooce/bin:/home/dap/tools/cockroachdb/bin:/home/dap/tools/clickhouse
    PS1=\[\033]0;\w (\h)\007\]\u@ivanova \[\]\W\[\] $ 
    PWD=/home/dap/omicron-iter
    RAWPS1=\u@ivanova \[\]\W\[\] $ 
    SHELL=/bin/bash
    SHLVL=1
    SSH_AUTH_SOCK=/tmp/ssh-XXXX_saivR/agent.22185
    SSH_CLIENT=172.20.16.62 50048 22
    SSH_CONNECTION=172.20.16.62 50048 172.20.2.70 22
    SSH_TTY=/dev/pts/2
    TERM=xterm-256color
    TMPDIR=/dangerzone/omicron_tmp
    TZ=US/Pacific
    USER=dap
    VISUAL=vim
    _=/home/dap/omicron-iter/target/debug/deps/omicron_test_utils-1a64da54be1ee8cc
thread 'dev::db::test::test_setup_database_default_dir' panicked at 'failed to start database: Unknown { source: checking listen file "/dangerzone/omicron_tmp/.tmplraWmB/listen-url"

Caused by:
    Permission denied (os error 13) }', test-utils/src/dev/db.rs:1247:35
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace


failures:
    dev::db::test::test_setup_database_default_dir

test result: FAILED. 0 passed; 1 failed; 0 ignored; 0 measured; 14 filtered out; finished in 1.95s

Update: I added an automated test, too.

jgallagher · 2022-12-22T20:19:40Z

test-utils/src/dev/db.rs

+                        Err(error) => {
+                            let maybe_errno = error.raw_os_error();
+                            if maybe_errno.is_some()
+                                && maybe_errno.unwrap() == libc::ENOENT


If you don't like the is_some/unwrap here, I think it'd be equivalent to check error.kind() == io::ErrorKind::NotFound: https://github.com/rust-lang/rust/blob/b70baa4f922a1809d79caeaeb902800c3be283b9/library/std/src/sys/unix/mod.rs#L252

Thanks. I think that will make this much cleaner. Fixing in 546b63c.

davepacheco · 2022-12-22T22:25:12Z

Thanks for the review!

fail faster on errors reading CockroachDB listening-url file

b3c75c7

jgallagher reviewed Dec 22, 2022

View reviewed changes

davepacheco added 2 commits December 22, 2022 14:09

add a test

3d0cc4f

review feedback

546b63c

davepacheco marked this pull request as ready for review December 22, 2022 22:13

davepacheco requested a review from jgallagher December 22, 2022 22:13

jgallagher approved these changes Dec 22, 2022

View reviewed changes

davepacheco enabled auto-merge (squash) December 22, 2022 22:25

davepacheco merged commit 6234c66 into main Dec 22, 2022

davepacheco deleted the test-error-eaccess branch December 22, 2022 22:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fail faster on errors reading CockroachDB listening-url file #2091

fail faster on errors reading CockroachDB listening-url file #2091

davepacheco commented Dec 22, 2022 •

edited

Loading

jgallagher Dec 22, 2022 •

edited

Loading

davepacheco Dec 22, 2022

davepacheco commented Dec 22, 2022

fail faster on errors reading CockroachDB listening-url file #2091

fail faster on errors reading CockroachDB listening-url file #2091

Conversation

davepacheco commented Dec 22, 2022 • edited Loading

jgallagher Dec 22, 2022 • edited Loading

Choose a reason for hiding this comment

davepacheco Dec 22, 2022

Choose a reason for hiding this comment

davepacheco commented Dec 22, 2022

davepacheco commented Dec 22, 2022 •

edited

Loading

jgallagher Dec 22, 2022 •

edited

Loading