Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: runtime: New runtime.Chain API to emit only a call chain from the callee goroutine #69648

Open
i3d opened this issue Sep 26, 2024 · 4 comments
Labels
Milestone

Comments

@i3d
Copy link
Contributor

i3d commented Sep 26, 2024

Proposal Details

Apology if there is/are already existing similar proposal like this, but allow me to describe the requirement.

Background

As far as I am aware of today, the Go runtime provides two different ways to obtain a particular stacktrace.

  • runtime.Stack(buf, all=true) and runtime.Stack(buf, all=false). The debug.Stack is basically a convenient helper to produce a local goroutine stack with fixed 1K buf size.
  • In a large scale application [we are running enterprise level applications to support large and multiple services in cloud], we observed that there is a challenge to help debugging. The only local goroutine based stack trace is way to limited and often give us little clue as of how we got here. The full goroutine one is also way to costly to deploy into our deeper stack [we recently just noticed a performance regression due to deployed a self-implemented full Stack dump]. The full stack dump is fairly useful but we also need to focus on just the call chain we care about [there are a lot of other goroutines from various different kind of areas/libs which don't typically add any useful value.]

The question is, can the Go runtime provides a balanced view of a particular call chain that invoked from the [for example, runtime.Chain] call goroutine and all the way up?

This would be a particular useful signal to help only concrete on the goroutine that emitted such call and it is typically the one that received some kind of error [either due to internal processing or from external].

Our home-grown solution so far is to produce a runtime.Stack(64k, all=true) and then manually walk the goroutine section one by one via text-based parsing using a few heuristics, for example '^goroutine ' or ' created by ' to find the call relationship. This is not only very costly but also error-prone.

Thus, we'd like to see whether it is doable for the Go runtime to provide a new API, say runtime.Chain so that it could help produce a "current goroutine" based call chain dump.

Given that the current stack dump annotates goroutine " created by", I assume that the runtime does at least have some kind of internal bookkeeping already to reason about the call relationship. Since Go doesn't advocate goroutine-based programming and there isn't much else options out there [are there?], we turn to the Go team to seek help.

Some Proposed semantics:

  1. It could be that producing such chain is still some non-trivial undertake, could it be possible to design some ABIs where the runtime could emit a list of goroutines IDs [or some cheap metadata if you still don't want to disclose implementation details] so that at least we could see a somewhat complete call stack? Or have some sort of object handle so to allow application to choose what to dump?
  2. The order output is on the sequence of call chain upwards. Basically, the one called runtime.Chain would be the first entry in the output, and the caller of that second, and so forth. If whatever emitted metadata can be programmable with hints about the relationship, e.g. created by, then it is fine without any order and allow application to stitch those together.
  3. It is possible that at the time when composing the chain, some of the goroutines already got terminated and purged out of the memory. I don't know the runtime detail enough to make any proposal here as of what happens if we see a gap, but I could assume that one potential option would be to end the chain. Often, with just one more or a couple more call chain, the debuggability can be greatly improved.

Thank you!
Jim

@i3d i3d added the Proposal label Sep 26, 2024
@gopherbot gopherbot added this to the Proposal milestone Sep 26, 2024
@ianlancetaylor
Copy link
Contributor

Have you tried setting the environment variable GODEBUG=tracebackancestors=1 ? That is documented at https://pkg.go.dev/runtime.

@i3d
Copy link
Contributor Author

i3d commented Sep 27, 2024

We have not, as a matter of fact, I searched from the entire codebase and found zero references from any of our production services that utilized this flag... [fyi, I am actually talking about Google borg... :) ]

@i3d
Copy link
Contributor Author

i3d commented Sep 27, 2024

I guess the other thing we hope to see is that a solution should mostly avoid STW, which is a bit too heavy to be conducted in a server serving path.

@ianlancetaylor ianlancetaylor moved this to Incoming in Proposals Oct 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: Incoming
Development

No branches or pull requests

4 participants