-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
wasmtime: Annotate emit-clif output with source line numbers #8771
base: main
Are you sure you want to change the base?
Conversation
When we're compiling a WebAssembly module that contains a DWARF `.debug_lines` section, this commit adds comments to the output of `wasmtime compile --emit-clif` indicating which file/line/column each block of CLIF instructions originated from. The DWARF info is currently only parsed if the `WASMTIME_BACKTRACE_DETAILS=1` environment variable is set or the equivalent `Config::wasm_backtrace_details` method has been used to enable it. This is useful when trying to understand why we're generating the code we do when there's a lot of WebAssembly in a single function. That can happen either because there's a lot of source code in that function, or because the toolchain (e.g. LLVM) inlined a lot of other functions into it before generating WebAssembly.
This looks great to me, but I'm going to re-assign reviewer to Nick because he knows this domain a lot better than me. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excited for this! but I have a concern, see below
/// A cache for quickly looking up DWARF file/line information given a source | ||
/// location. In general, `addr2line`-style tools need to do linear searches | ||
/// through the `.debug_lines` section to find a given address. However, if | ||
/// you need to look up many addresses and they're usually in roughly ascending | ||
/// order, then this structure can remember what you last looked up and scan | ||
/// forward from there. | ||
pub struct LineContext<'a> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We already depend on https://github.com/gimli-rs/addr2line/ in the workspace and it already does lazy parsing and caching to make repeated lookups fast -- is there a particular reason you didn't reach for that crate? Because this is a fair amount of code and there are a lot of corner cases with interpreting DWARF, especially when you start trying to be robust against buggy producers and toolchains. If there isn't a strong motivation to avoid that crate here, I'd much rather rely upon it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a great reason, no. I started off by looking at the implementation of wasm-tools addr2line
and how it uses the addr2line
crate, but I got confused. I had an easier time understanding how to use gimli
directly.
Also, addr2line
was relying on parts of the DWARF that were constructed incorrectly in the wasm binary I was looking at: in addition to file/line number, it also tried to find which (possibly inlined) function frames included each instruction. I fixed gimli
upstream to be more forgiving of those errors, which are introduced by wasm-opt
, but I don't think we've pulled in the new version yet. Maybe we can use addr2line
without that feature? I haven't checked.
It certainly makes sense to use the addr2line
crate here but I suspect it'll be a bit before I can get the details right. How would you feel about merging this version now and coming back to this later?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I fixed
gimli
upstream to be more forgiving of those errors, which are introduced bywasm-opt
, but I don't think we've pulled in the new version yet.
Have you checked if using that fixed version as a git dep works? If it does, I'd prefer just asking philipc to cut a release and then updating to it in a day or two.
If there are still issues on main
, then I guess we could land this as-is, but I don't have confidence that we will do better here than addr2line
long term, given that (a) we already use that elsewhere and (b) that crate is used by a bunch of folks (eg the firefox profiler) and is more likely to get bugs shaken out of it from sheer usage than this thing will.
Maybe we can use
addr2line
without that feature? I haven't checked.
I think you should just be able to do find_location
to get the leaf-most source location, rather than an iterator of inlined frames.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
philipc was delightfully prompt about cutting a new gimli release with my patches, we just have to pull that release in. And I believe I did test wasm-tools addr2line
against my version of gimli and it fixed the worst of the issues I had, although the output was not quite as complete as llvm-addr2line
on that particular input, for reasons I haven't tracked down.
Next week I can try to figure out how to use addr2line
instead of my hand-rolled cache.
When we're compiling a WebAssembly module that contains a DWARF
.debug_lines
section, this commit adds comments to the output ofwasmtime compile --emit-clif
indicating which file/line/column each block of CLIF instructions originated from.This is useful when trying to understand why we're generating the code we do when there's a lot of WebAssembly in a single function. That can happen either because there's a lot of source code in that function, or because the toolchain (e.g. LLVM) inlined a lot of other functions into it before generating WebAssembly.