Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cockroach panicked after receiving SIGHUP too early #2334

Open
davepacheco opened this issue Feb 6, 2023 · 0 comments
Open

cockroach panicked after receiving SIGHUP too early #2334

davepacheco opened this issue Feb 6, 2023 · 0 comments

Comments

@davepacheco
Copy link
Collaborator

tl;dr: If cockroach receives SIGHUP too early during initialization, it can crash. I encountered this while running the test suite in a loop (even under nohup!). I don't expect we'll do anything about this but I wanted to record it in case anybody else hits it.

While running the omicron test suite in a loop overnight, I got to my system and found that my VPN connection had terminated. Since I ran the loop under nohup, I didn't think much of it. When I got logged back in, I found that the test suite had failed with:

failures:

---- db::datastore::test::test_session_methods stdout ----
log file: /home/dap/omicron-cockroachdb/tmpdir/omicron_nexus-49637a4102530ffa-test_session_methods.23068.44.log
note: configured to log to "/home/dap/omicron-cockroachdb/tmpdir/omicron_nexus-49637a4102530ffa-test_session_methods.23068.44.log"
thread 'db::datastore::test::test_session_methods' panicked at 'failed to start CockroachDB: cockroach unexpectedly terminated by signal 6 (see error output above)',
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace


failures:
    db::datastore::test::test_session_methods

test result: FAILED. 92 passed; 1 failed; 0 ignored; 0 measured; 68 filtered out; finished in 124.97s

error: test failed, to rerun pass `-p omicron-nexus --lib`

I got to the cockroachdb_stderr file in the tmp directory and found:

dap@ivanova omicron-cockroachdb $ cat /home/dap/omicron-cockroachdb/tmpdir/.tmpvmaW06/cockroachdb_stderr
++ dirname /home/dap/omicron-cockroachdb/debug-bin/cockroach
+ dir=/home/dap/omicron-cockroachdb/debug-bin
+ cd /home/dap/omicron-cockroachdb/debug-bin/..
+ real_cockroach=/home/dap/tools/cockroach-22.1.9-fail-fast.3/cockroach-v22.1.9/bin/cockroach
+ [[ start-single-node != \v\e\r\s\i\o\n ]]
+ echo 'debug-bin/cockroach: not instrumenting this invocation'
+ exec /home/dap/tools/cockroach-22.1.9-fail-fast.3/cockroach-v22.1.9/bin/cockroach start-single-node --insecure --http-addr=:0 --store=path=/home/dap/omicron-cockroachdb/tmpdir/.tmpvmaW06/data,ballast-size=0 --listen-addr 127.0.0.1:0 --listening-url-file /home/dap/omicron-cockroachdb/tmpdir/.tmpvmaW06/listen-url
I230206 00:03:37.458828 19 1@util/log/log_flush.go:99  [-] 1  hangup received, flushing logs
E230206 00:03:37.556581 1 1@util/log/logcrash/crash_reporting.go:174  [-] 2  a panic has occurred!
E230206 00:03:37.556581 1 1@util/log/logcrash/crash_reporting.go:174  [-] 2 +logging already active; first used at:
E230206 00:03:37.556581 1 1@util/log/logcrash/crash_reporting.go:174  [-] 2 +(1) attached stack trace
E230206 00:03:37.556581 1 1@util/log/logcrash/crash_reporting.go:174  [-] 2 +  -- stack trace:
E230206 00:03:37.556581 1 1@util/log/logcrash/crash_reporting.go:174  [-] 2 +  | github.com/cockroachdb/cockroach/pkg/cli.setupAndInitializeLoggingAndProfiling
E230206 00:03:37.556581 1 1@util/log/logcrash/crash_reporting.go:174  [-] 2 +  | 	/home/dap/garbage-compactor/cockroach/cache/gopath/src/github.com/cockroachdb/cockroach/pkg/cli/start.go:1170
E230206 00:03:37.556581 1 1@util/log/logcrash/crash_reporting.go:174  [-] 2 +  | [...repeated from below...]
E230206 00:03:37.556581 1 1@util/log/logcrash/crash_reporting.go:174  [-] 2 +Wraps: (2) attached stack trace
E230206 00:03:37.556581 1 1@util/log/logcrash/crash_reporting.go:174  [-] 2 +  -- stack trace:
E230206 00:03:37.556581 1 1@util/log/logcrash/crash_reporting.go:174  [-] 2 +  | github.com/cockroachdb/cockroach/pkg/cli.setupLogging
E230206 00:03:37.556581 1 1@util/log/logcrash/crash_reporting.go:174  [-] 2 +  | 	/home/dap/garbage-compactor/cockroach/cache/gopath/src/github.com/cockroachdb/cockroach/pkg/cli/log_flags.go:56
E230206 00:03:37.556581 1 1@util/log/logcrash/crash_reporting.go:174  [-] 2 +  | github.com/cockroachdb/cockroach/pkg/cli.setupAndInitializeLoggingAndProfiling
E230206 00:03:37.556581 1 1@util/log/logcrash/crash_reporting.go:174  [-] 2 +  | 	/home/dap/garbage-compactor/cockroach/cache/gopath/src/github.com/cockroachdb/cockroach/pkg/cli/start.go:1170
E230206 00:03:37.556581 1 1@util/log/logcrash/crash_reporting.go:174  [-] 2 +  | github.com/cockroachdb/cockroach/pkg/cli.runStart
E230206 00:03:37.556581 1 1@util/log/logcrash/crash_reporting.go:174  [-] 2 +  | 	/home/dap/garbage-compactor/cockroach/cache/gopath/src/github.com/cockroachdb/cockroach/pkg/cli/start.go:426
E230206 00:03:37.556581 1 1@util/log/logcrash/crash_reporting.go:174  [-] 2 +  | github.com/cockroachdb/cockroach/pkg/cli.runStartSingleNode
E230206 00:03:37.556581 1 1@util/log/logcrash/crash_reporting.go:174  [-] 2 +  | 	/home/dap/garbage-compactor/cockroach/cache/gopath/src/github.com/cockroachdb/cockroach/pkg/cli/start.go:339
E230206 00:03:37.556581 1 1@util/log/logcrash/crash_reporting.go:174  [-] 2 +  | github.com/cockroachdb/cockroach/pkg/cli/clierrorplus.MaybeDecorateError.func1
E230206 00:03:37.556581 1 1@util/log/logcrash/crash_reporting.go:174  [-] 2 +  | 	/home/dap/garbage-compactor/cockroach/cache/gopath/src/github.com/cockroachdb/cockroach/pkg/cli/clierrorplus/decorate_error.go:67
E230206 00:03:37.556581 1 1@util/log/logcrash/crash_reporting.go:174  [-] 2 +  | github.com/cockroachdb/cockroach/pkg/cli/clierrorplus.MaybeShoutError.func1
E230206 00:03:37.556581 1 1@util/log/logcrash/crash_reporting.go:174  [-] 2 +  | 	/home/dap/garbage-compactor/cockroach/cache/gopath/src/github.com/cockroachdb/cockroach/pkg/cli/clierrorplus/shout.go:24
E230206 00:03:37.556581 1 1@util/log/logcrash/crash_reporting.go:174  [-] 2 +  | github.com/spf13/cobra.(*Command).execute
E230206 00:03:37.556581 1 1@util/log/logcrash/crash_reporting.go:174  [-] 2 +  | 	/home/dap/garbage-compactor/cockroach/cache/gopath/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:856
E230206 00:03:37.556581 1 1@util/log/logcrash/crash_reporting.go:174  [-] 2 +  | github.com/spf13/cobra.(*Command).ExecuteC
E230206 00:03:37.556581 1 1@util/log/logcrash/crash_reporting.go:174  [-] 2 +  | 	/home/dap/garbage-compactor/cockroach/cache/gopath/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:974
E230206 00:03:37.556581 1 1@util/log/logcrash/crash_reporting.go:174  [-] 2 +  | github.com/spf13/cobra.(*Command).Execute
E230206 00:03:37.556581 1 1@util/log/logcrash/crash_reporting.go:174  [-] 2 +  | 	/home/dap/garbage-compactor/cockroach/cache/gopath/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:902
E230206 00:03:37.556581 1 1@util/log/logcrash/crash_reporting.go:174  [-] 2 +  | github.com/cockroachdb/cockroach/pkg/cli.Run
E230206 00:03:37.556581 1 1@util/log/logcrash/crash_reporting.go:174  [-] 2 +  | 	/home/dap/garbage-compactor/cockroach/cache/gopath/src/github.com/cockroachdb/cockroach/pkg/cli/cli.go:298
E230206 00:03:37.556581 1 1@util/log/logcrash/crash_reporting.go:174  [-] 2 +  | github.com/cockroachdb/cockroach/pkg/cli.doMain
E230206 00:03:37.556581 1 1@util/log/logcrash/crash_reporting.go:174  [-] 2 +  | 	/home/dap/garbage-compactor/cockroach/cache/gopath/src/github.com/cockroachdb/cockroach/pkg/cli/cli.go:140
E230206 00:03:37.556581 1 1@util/log/logcrash/crash_reporting.go:174  [-] 2 +  | github.com/cockroachdb/cockroach/pkg/cli.Main
E230206 00:03:37.556581 1 1@util/log/logcrash/crash_reporting.go:174  [-] 2 +  | 	/home/dap/garbage-compactor/cockroach/cache/gopath/src/github.com/cockroachdb/cockroach/pkg/cli/cli.go:65
E230206 00:03:37.556581 1 1@util/log/logcrash/crash_reporting.go:174  [-] 2 +  | main.main
E230206 00:03:37.556581 1 1@util/log/logcrash/crash_reporting.go:174  [-] 2 +  | 	/home/dap/garbage-compactor/cockroach/cache/gopath/src/github.com/cockroachdb/cockroach/pkg/cmd/cockroach-oss/main.go:21
E230206 00:03:37.556581 1 1@util/log/logcrash/crash_reporting.go:174  [-] 2 +  | runtime.main
E230206 00:03:37.556581 1 1@util/log/logcrash/crash_reporting.go:174  [-] 2 +  | 	/home/dap/go-build-for-cockroachdb/go/src/runtime/proc.go:255
E230206 00:03:37.556581 1 1@util/log/logcrash/crash_reporting.go:174  [-] 2 +  | runtime.goexit
E230206 00:03:37.556581 1 1@util/log/logcrash/crash_reporting.go:174  [-] 2 +  | 	/home/dap/go-build-for-cockroachdb/go/src/runtime/asm_amd64.s:1581
E230206 00:03:37.556581 1 1@util/log/logcrash/crash_reporting.go:174  [-] 2 +Wraps: (3) logging already active; first used at:
E230206 00:03:37.556581 1 1@util/log/logcrash/crash_reporting.go:174  [-] 2 +  | goroutine 19 [running]:
E230206 00:03:37.556581 1 1@util/log/logcrash/crash_reporting.go:174  [-] 2 +  | runtime/debug.Stack()
E230206 00:03:37.556581 1 1@util/log/logcrash/crash_reporting.go:174  [-] 2 +  | 	/home/dap/go-build-for-cockroachdb/go/src/runtime/debug/stack.go:24 +0x9f
E230206 00:03:37.556581 1 1@util/log/logcrash/crash_reporting.go:174  [-] 2 +  | github.com/cockroachdb/cockroach/pkg/util/log.setActive()
E230206 00:03:37.556581 1 1@util/log/logcrash/crash_reporting.go:174  [-] 2 +  | 	/home/dap/garbage-compactor/cockroach/cache/gopath/src/github.com/cockroachdb/cockroach/pkg/util/log/clog.go:394 +0x8e
E230206 00:03:37.556581 1 1@util/log/logcrash/crash_reporting.go:174  [-] 2 +  | github.com/cockroachdb/cockroach/pkg/util/log.(*loggerT).outputLogEntry(0xc00020cec0, {{{0x0, 0x0}, {0x0, 0x0}, {0x0, 0x0}, {0x0, 0x0}}, 0x1741135ae46c62e9, ...})
E230206 00:03:37.556581 1 1@util/log/logcrash/crash_reporting.go:174  [-] 2 +  | 	/home/dap/garbage-compactor/cockroach/cache/gopath/src/github.com/cockroachdb/cockroach/pkg/util/log/clog.go:226 +0x5e
E230206 00:03:37.556581 1 1@util/log/logcrash/crash_reporting.go:174  [-] 2 +  | github.com/cockroachdb/cockroach/pkg/util/log.logfDepthInternal({0x8621438, 0xc000108008}, 0x2, 0x1, 0x1, 0x0, {0x608a87f, 0x1a}, {0xc0000aefa0, 0x1, ...})
E230206 00:03:37.556581 1 1@util/log/logcrash/crash_reporting.go:174  [-] 2 +  | 	/home/dap/garbage-compactor/cockroach/cache/gopath/src/github.com/cockroachdb/cockroach/pkg/util/log/channels.go:106 +0x738
E230206 00:03:37.556581 1 1@util/log/logcrash/crash_reporting.go:174  [-] 2 +  | github.com/cockroachdb/cockroach/pkg/util/log.logfDepth(...)
E230206 00:03:37.556581 1 1@util/log/logcrash/crash_reporting.go:174  [-] 2 +  | 	/home/dap/garbage-compactor/cockroach/cache/gopath/src/github.com/cockroachdb/cockroach/pkg/util/log/channels.go:39
E230206 00:03:37.556581 1 1@util/log/logcrash/crash_reporting.go:174  [-] 2 +  | github.com/cockroachdb/cockroach/pkg/util/log.loggerOps.Infof(...)
E230206 00:03:37.556581 1 1@util/log/logcrash/crash_reporting.go:174  [-] 2 +  | 	/home/dap/garbage-compactor/cockroach/cache/gopath/src/github.com/cockroachdb/cockroach/pkg/util/log/log_channels_generated.go:1027
E230206 00:03:37.556581 1 1@util/log/logcrash/crash_reporting.go:174  [-] 2 +  | github.com/cockroachdb/cockroach/pkg/util/log.signalFlusher()
E230206 00:03:37.556581 1 1@util/log/logcrash/crash_reporting.go:174  [-] 2 +  | 	/home/dap/garbage-compactor/cockroach/cache/gopath/src/github.com/cockroachdb/cockroach/pkg/util/log/log_flush.go:99 +0x125
E230206 00:03:37.556581 1 1@util/log/logcrash/crash_reporting.go:174  [-] 2 +  | created by github.com/cockroachdb/cockroach/pkg/util/log.init.5
E230206 00:03:37.556581 1 1@util/log/logcrash/crash_reporting.go:174  [-] 2 +  | 	/home/dap/garbage-compactor/cockroach/cache/gopath/src/github.com/cockroachdb/cockroach/pkg/util/log/log_flush.go:42 +0x4d
E230206 00:03:37.556581 1 1@util/log/logcrash/crash_reporting.go:174  [-] 2 +Error types: (1) *withstack.withStack (2) *withstack.withStack (3) *errutil.leafError
panic: logging already active; first used at:
goroutine 19 [running]:
runtime/debug.Stack()
	/home/dap/go-build-for-cockroachdb/go/src/runtime/debug/stack.go:24 +0x9f
github.com/cockroachdb/cockroach/pkg/util/log.setActive()
	/home/dap/garbage-compactor/cockroach/cache/gopath/src/github.com/cockroachdb/cockroach/pkg/util/log/clog.go:394 +0x8e
github.com/cockroachdb/cockroach/pkg/util/log.(*loggerT).outputLogEntry(0xc00020cec0, {{{0x0, 0x0}, {0x0, 0x0}, {0x0, 0x0}, {0x0, 0x0}}, 0x1741135ae46c62e9, ...})
	/home/dap/garbage-compactor/cockroach/cache/gopath/src/github.com/cockroachdb/cockroach/pkg/util/log/clog.go:226 +0x5e
github.com/cockroachdb/cockroach/pkg/util/log.logfDepthInternal({0x8621438, 0xc000108008}, 0x2, 0x1, 0x1, 0x0, {0x608a87f, 0x1a}, {0xc0000aefa0, 0x1, ...})
	/home/dap/garbage-compactor/cockroach/cache/gopath/src/github.com/cockroachdb/cockroach/pkg/util/log/channels.go:106 +0x738
github.com/cockroachdb/cockroach/pkg/util/log.logfDepth(...)
	/home/dap/garbage-compactor/cockroach/cache/gopath/src/github.com/cockroachdb/cockroach/pkg/util/log/channels.go:39
github.com/cockroachdb/cockroach/pkg/util/log.loggerOps.Infof(...)
	/home/dap/garbage-compactor/cockroach/cache/gopath/src/github.com/cockroachdb/cockroach/pkg/util/log/log_channels_generated.go:1027
github.com/cockroachdb/cockroach/pkg/util/log.signalFlusher()
	/home/dap/garbage-compactor/cockroach/cache/gopath/src/github.com/cockroachdb/cockroach/pkg/util/log/log_flush.go:99 +0x125
created by github.com/cockroachdb/cockroach/pkg/util/log.init.5
	/home/dap/garbage-compactor/cockroach/cache/gopath/src/github.com/cockroachdb/cockroach/pkg/util/log/log_flush.go:42 +0x4d [recovered]
	panic: logging already active; first used at:
goroutine 19 [running]:
runtime/debug.Stack()
	/home/dap/go-build-for-cockroachdb/go/src/runtime/debug/stack.go:24 +0x9f
github.com/cockroachdb/cockroach/pkg/util/log.setActive()
	/home/dap/garbage-compactor/cockroach/cache/gopath/src/github.com/cockroachdb/cockroach/pkg/util/log/clog.go:394 +0x8e
github.com/cockroachdb/cockroach/pkg/util/log.(*loggerT).outputLogEntry(0xc00020cec0, {{{0x0, 0x0}, {0x0, 0x0}, {0x0, 0x0}, {0x0, 0x0}}, 0x1741135ae46c62e9, ...})
	/home/dap/garbage-compactor/cockroach/cache/gopath/src/github.com/cockroachdb/cockroach/pkg/util/log/clog.go:226 +0x5e
github.com/cockroachdb/cockroach/pkg/util/log.logfDepthInternal({0x8621438, 0xc000108008}, 0x2, 0x1, 0x1, 0x0, {0x608a87f, 0x1a}, {0xc0000aefa0, 0x1, ...})
	/home/dap/garbage-compactor/cockroach/cache/gopath/src/github.com/cockroachdb/cockroach/pkg/util/log/channels.go:106 +0x738
github.com/cockroachdb/cockroach/pkg/util/log.logfDepth(...)
	/home/dap/garbage-compactor/cockroach/cache/gopath/src/github.com/cockroachdb/cockroach/pkg/util/log/channels.go:39
github.com/cockroachdb/cockroach/pkg/util/log.loggerOps.Infof(...)
	/home/dap/garbage-compactor/cockroach/cache/gopath/src/github.com/cockroachdb/cockroach/pkg/util/log/log_channels_generated.go:1027
github.com/cockroachdb/cockroach/pkg/util/log.signalFlusher()
	/home/dap/garbage-compactor/cockroach/cache/gopath/src/github.com/cockroachdb/cockroach/pkg/util/log/log_flush.go:99 +0x125
created by github.com/cockroachdb/cockroach/pkg/util/log.init.5
	/home/dap/garbage-compactor/cockroach/cache/gopath/src/github.com/cockroachdb/cockroach/pkg/util/log/log_flush.go:42 +0x4d

goroutine 1 [running]:
panic({0x5cc67a0, 0xc00072c0f0})
	/home/dap/go-build-for-cockroachdb/go/src/runtime/panic.go:1147 +0x465 fp=0xc001a1eb68 sp=0xc001a1ea90 pc=0x127d1c5
github.com/cockroachdb/cockroach/pkg/util/log/logcrash.RecoverAndReportPanic({0x8621438, 0xc000108008}, 0xc000e5ca80)
	/home/dap/garbage-compactor/cockroach/cache/gopath/src/github.com/cockroachdb/cockroach/pkg/util/log/logcrash/crash_reporting.go:124 +0xa5 fp=0xc001a1ebb8 sp=0xc001a1eb68 pc=0x2152b05
runtime.call32(0x0, 0x7476ed8, 0xc00072a4c8, 0x18, 0x18, 0x18, 0xc001a1ec18)
	/home/dap/go-build-for-cockroachdb/go/src/runtime/asm_amd64.s:626 +0x49 fp=0xc001a1ebe8 sp=0xc001a1ebb8 pc=0x12b5069
runtime.reflectcallSave(0xc001a1ed40, 0x7476ed8, 0xc00072a4c8, 0x18)
	/home/dap/go-build-for-cockroachdb/go/src/runtime/panic.go:927 +0xb5 fp=0xc001a1ec28 sp=0xc001a1ebe8 pc=0x127cd15
runtime.runOpenDeferFrame(0xc0000001a0, 0xc00072a480)
	/home/dap/go-build-for-cockroachdb/go/src/runtime/panic.go:891 +0x2bb fp=0xc001a1ecb0 sp=0xc001a1ec28 pc=0x127c63b
panic({0x5cc67a0, 0xc00072c0f0})
	/home/dap/go-build-for-cockroachdb/go/src/runtime/panic.go:1038 +0x25b fp=0xc001a1ed88 sp=0xc001a1ecb0 pc=0x127cfbb
github.com/cockroachdb/cockroach/pkg/cli.setupLogging({0x86214a8, 0xc0015002a0}, 0xaf217e0, 0x1, 0x1)
	/home/dap/garbage-compactor/cockroach/cache/gopath/src/github.com/cockroachdb/cockroach/pkg/cli/log_flags.go:56 +0xef0 fp=0xc001a1f488 sp=0xc001a1ed88 pc=0x546a230
github.com/cockroachdb/cockroach/pkg/cli.setupAndInitializeLoggingAndProfiling({0x86214a8, 0xc0015002a0}, 0xaf217e0, 0x1)
	/home/dap/garbage-compactor/cockroach/cache/gopath/src/github.com/cockroachdb/cockroach/pkg/cli/start.go:1170 +0x6e fp=0xc001a1f6a0 sp=0xc001a1f488 pc=0x547faae
github.com/cockroachdb/cockroach/pkg/cli.runStart(0xaf217e0, {0xc000042230, 0x0, 0x7}, 0x1)
	/home/dap/garbage-compactor/cockroach/cache/gopath/src/github.com/cockroachdb/cockroach/pkg/cli/start.go:426 +0x237 fp=0xc001a1f9e8 sp=0xc001a1f6a0 pc=0x5479357
github.com/cockroachdb/cockroach/pkg/cli.runStartSingleNode(0xaf217e0, {0xc000042230, 0x0, 0x7})
	/home/dap/garbage-compactor/cockroach/cache/gopath/src/github.com/cockroachdb/cockroach/pkg/cli/start.go:339 +0x196 fp=0xc001a1fa38 sp=0xc001a1f9e8 pc=0x5479036
github.com/cockroachdb/cockroach/pkg/cli/clierrorplus.MaybeDecorateError.func1(0xaf217e0, {0xc000042230, 0x0, 0x7})
	/home/dap/garbage-compactor/cockroach/cache/gopath/src/github.com/cockroachdb/cockroach/pkg/cli/clierrorplus/decorate_error.go:67 +0x7b fp=0xc001a1fc80 sp=0xc001a1fa38 pc=0x4dbea5b
github.com/cockroachdb/cockroach/pkg/cli/clierrorplus.MaybeShoutError.func1(0xaf217e0, {0xc000042230, 0x0, 0x7})
	/home/dap/garbage-compactor/cockroach/cache/gopath/src/github.com/cockroachdb/cockroach/pkg/cli/clierrorplus/shout.go:24 +0x51 fp=0xc001a1fcd0 sp=0xc001a1fc80 pc=0x4dbff91
github.com/spf13/cobra.(*Command).execute(0xaf217e0, {0xc0000421c0, 0x7, 0x7})
	/home/dap/garbage-compactor/cockroach/cache/gopath/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:856 +0x8dc fp=0xc001a1fd90 sp=0xc001a1fcd0 pc=0x49ed19c
github.com/spf13/cobra.(*Command).ExecuteC(0xaf20660)
	/home/dap/garbage-compactor/cockroach/cache/gopath/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:974 +0x529 fp=0xc001a1fe70 sp=0xc001a1fd90 pc=0x49edae9
github.com/spf13/cobra.(*Command).Execute(...)
	/home/dap/garbage-compactor/cockroach/cache/gopath/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:902
github.com/cockroachdb/cockroach/pkg/cli.Run(...)
	/home/dap/garbage-compactor/cockroach/cache/gopath/src/github.com/cockroachdb/cockroach/pkg/cli/cli.go:298
github.com/cockroachdb/cockroach/pkg/cli.doMain(0xaf217e0, {0xc000e143ca, 0x11})
	/home/dap/garbage-compactor/cockroach/cache/gopath/src/github.com/cockroachdb/cockroach/pkg/cli/cli.go:140 +0x1ec fp=0xc001a1fed8 sp=0xc001a1fe70 pc=0x542482c
github.com/cockroachdb/cockroach/pkg/cli.Main()
	/home/dap/garbage-compactor/cockroach/cache/gopath/src/github.com/cockroachdb/cockroach/pkg/cli/cli.go:65 +0x1a5 fp=0xc001a1ff78 sp=0xc001a1fed8 pc=0x5424445
main.main()
	/home/dap/garbage-compactor/cockroach/cache/gopath/src/github.com/cockroachdb/cockroach/pkg/cmd/cockroach-oss/main.go:21 +0x25 fp=0xc001a1ff88 sp=0xc001a1ff78 pc=0x54af305
runtime.main()
	/home/dap/go-build-for-cockroachdb/go/src/runtime/proc.go:255 +0x248 fp=0xc001a1ffe0 sp=0xc001a1ff88 pc=0x1280208
runtime.goexit()
	/home/dap/go-build-for-cockroachdb/go/src/runtime/asm_amd64.s:1581 +0x1 fp=0xc001a1ffe8 sp=0xc001a1ffe0 pc=0x12b6da1

(there are more goroutine stacks but they're irrelevant)

My first clue may have been that this appeared in stderr, not the stderr log file. This happened before Cockroach had switched to using its log file. I found cockroachdb/cockroach#84638, where knz explained explicitly what was going on: cockroach took a SIGHUP that caused it to try to log something out before having initialized the logging subsystem.

The remaining mystery was: where did the SIGHUP come from? I wouldn't normally be that interested in that, but I had been in the process of trying to verify the fixes for illumos#15254 and illumos#15367 and wanted to be sure that the panic wasn't somehow a result of an OS issue. (In particular, our understanding of both issues is that they can cause memory that should be zero'd to be non-zero, and it can totally cause messages that look like "this thing that should have been uninitialized was initialized", like the addtimer called with initialized timer message mentioned in #1146). Omicron doesn't appear to use SIGHUP at all. And I ran this under nohup. But I saw in the output of last that I had a login session end during the same minute as this problem was reported. And I found that nohup only sets the disposition for SIGHUP to SIG_IGN and then exec's what you gave it. This causes immediately children to be immune to SIGHUP, and children of those processes will too, unless any of them installs a signal handler for SIGHUP. The Go runtime does install a handler for SIGHUP.

I reproduced what I thought happened by:

  1. From a clean slate, in an ssh session over the engineering VPN, started a test run in the background (./run.sh &)
  2. tail -f the output file. This is important -- output must be going to the ssh connection when the next step happens or the server will never notice that there's a network problem.
  3. disconnected from the VPN
  4. hit enter on the client and waited a few seconds for Mac to decide the TCP connection was terminated. (if you don't do this, the client may never learn the TCP connection was broken, as I haven't configured ssh-level or tcp-level keep-alives. and if I didn't wait for this, but then reconnected to the VPN, the TCP session would have been resumed intact.)
  5. reconnected to the VPN
  6. ssh'd back to the system. Confirmed that everything was still running. But a few seconds later, the previous TCP connection was torn down. I accidentally reproduced this bug while doing this. (I had tried this a few times and only reproduced the panic once. I wasn't actually trying to reproduce the panic. I just wanted to see what signals were sent when this happened.)

Before having started this, I traced SIGHUP signals sent with:

$ pfexec dtrace -n 'signal-send/args[2] == 1/{ printf("pid %d execname %s (args %s) sending %d to pid %d execname %s (args %s)", pid, execname, curpsinfo->pr_psargs, args[2], args[1]->pr_pid, args[1]->pr_fname, args[1]->pr_psargs); }'
dtrace: description 'signal-send' matched 1 probe
CPU     ID                    FUNCTION:NAME
  6    765            sigtoproc:signal-send pid 0 execname sched (args sched) sending 1 to pid 27854 execname bash (args -bash)
  0    765            sigtoproc:signal-send pid 27854 execname bash (args -bash) sending 1 to pid 228 execname omicron_nexus-49 (args /home/dap/omicron-cockroachdb/target/debug/deps/omicron_nexus-49637a4102530ffa )
  0    765            sigtoproc:signal-send pid 27854 execname bash (args -bash) sending 1 to pid 1897 execname cockroach (args /home/dap/tools/cockroach-22.1.9-fail-fast.3/cockroach-v22.1.9/bin/cockroach st)
  0    765            sigtoproc:signal-send pid 27854 execname bash (args -bash) sending 1 to pid 1909 execname cockroach (args /home/dap/tools/cockroach-22.1.9-fail-fast.3/cockroach-v22.1.9/bin/cockroach st)
  0    765            sigtoproc:signal-send pid 27854 execname bash (args -bash) sending 1 to pid 1920 execname cockroach (args /home/dap/tools/cockroach-22.1.9-fail-fast.3/cockroach-v22.1.9/bin/cockroach st)
  0    765            sigtoproc:signal-send pid 27854 execname bash (args -bash) sending 1 to pid 1954 execname cockroach (args /home/dap/tools/cockroach-22.1.9-fail-fast.3/cockroach-v22.1.9/bin/cockroach st)
  0    765            sigtoproc:signal-send pid 27854 execname bash (args -bash) sending 1 to pid 1980 execname cockroach (args /home/dap/tools/cockroach-22.1.9-fail-fast.3/cockroach-v22.1.9/bin/cockroach st)
  0    765            sigtoproc:signal-send pid 27854 execname bash (args -bash) sending 1 to pid 1984 execname cockroach (args /home/dap/tools/cockroach-22.1.9-fail-fast.3/cockroach-v22.1.9/bin/cockroach st)
  0    765            sigtoproc:signal-send pid 27854 execname bash (args -bash) sending 1 to pid 961 execname tail (args tail -f run-224.out)
  0    765            sigtoproc:signal-send pid 27854 execname bash (args -bash) sending 1 to pid 27854 execname bash (args -bash)
  0    765            sigtoproc:signal-send pid 27854 execname bash (args -bash) sending 1 to pid 961 execname tail (args tail -f run-224.out)
^C

so that confirms that bash sends SIGHUP to all these processes. I had also checked with psig that the Go runtime had installed its SIGHUP handler.

So all of this behavior now makes sense: I had some VPN disconnect, my shell session ended, the login shell sent SIGHUP to cockroach, causing an early exit and a test failure.

Now the only question is: I'm pretty sure I've done this before (run ./run.sh in the background and either logged out or got disconnectedfrom the VPN) without issue and come back and not seen this panic. I think I must just have been getting lucky. If the SIGHUP doesn't hit cockroach before it's initialized its logs, then it will trigger a spurious config file reload, but otherwise everything will barrel on fine. It might also be fine if I weren't tail'ing the log file because then sshd may not have noticed the TCP issue. (Then some other test failure would have eventually triggered it all to come down.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant