Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for DWARFv5 embedded source code extension #849

Merged
merged 8 commits into from
Jul 16, 2024

Conversation

Mrmaxmeier
Copy link
Contributor

@Mrmaxmeier Mrmaxmeier commented Jun 28, 2024

Hi,

this implements support for source code embedded directly inside of the DWARF debug info. (Fixes #835)

I'm not sure about the logic in source_by_path though. Should it only match full paths or relative paths instead?
My use-case is fine with either way since I'm iterating over the files in the debug session directly.

This PR currently depends on support in gimli that has not made it into a release yet. I'll update the PR once a new version of gimli is available on crates.io.

Thanks!

@Mrmaxmeier Mrmaxmeier marked this pull request as ready for review July 16, 2024 08:55
Copy link
Member

@Swatinem Swatinem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks very good. Too bad its not possible to quickly check whether source is available. However I believe we only check that once when files are uploaded to Sentry or downloaded in Symbolicator.

I also think that native only matches on full absolute paths. When in doubt, its possible to double-check by looking at how a SourceBundle is being constructed from an Object. In that case it might also be interesting to update that logic to prefer the embedded source vs loading it from disk.

symbolic-debuginfo/src/base.rs Outdated Show resolved Hide resolved
symbolic-debuginfo/src/elf.rs Outdated Show resolved Hide resolved
) -> Result<Option<SourceFileDescriptor<'_>>, DwarfError> {
// First: see if there's an exact match for the provided path
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe its okay to use a OnceLock here to collect these into a HashMap for faster lookups.
Usually we want to optimize for lookups here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay 👍
I'd like to avoid collecting all source code in a big map of owned strings but I'm not sure if that's doable given that the DwarfFileIterator<'_> returned by files(&self) give out lifetimes shorter than 'data IIUC.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's your take on this implementation? https://github.com/getsentry/symbolic/pull/849/files#diff-ffd2feab131b47be099bceda4b175052d29c014642276be0168bd3d782a569f9L1351

I'm using a map now, but still iterate over the DWARF units on lookup to avoid dealing with owned strings and lifetime issues.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think thats a reasonable compromise there given that fixing all the lifetimes is probably way too complicated for now.

@Swatinem Swatinem merged commit 7dc28dd into getsentry:master Jul 16, 2024
11 checks passed
@Mrmaxmeier Mrmaxmeier deleted the embedded-source branch July 16, 2024 13:25
loewenheim added a commit to getsentry/symbolicator that referenced this pull request Jul 24, 2024
Because of getsentry/symbolic#849, this
means Elf/Wasm debug files can now contain source information.
We update `FileType::sources` to reflect this.
loewenheim added a commit to getsentry/symbolicator that referenced this pull request Jul 24, 2024
Because of getsentry/symbolic#849,
Elf/Wasm debug files can now contain source information.
We update `FileType::sources` to reflect this.
loewenheim added a commit that referenced this pull request Sep 30, 2024
@loewenheim
Copy link
Contributor

@Mrmaxmeier I'm sorry, we'll have to revert this. The check for whether a file has sources is inordinately expensive in terms of memory and is causing issues in our infrastructure.

@Mrmaxmeier
Copy link
Contributor Author

Welp, sorry about that 🫠

@loewenheim
Copy link
Contributor

Not at all, thank you for the contribution. Sorry we can't support it at the time :/

loewenheim added a commit that referenced this pull request Oct 1, 2024
)

This reverts the "feature part" of commit 7dc28dd (#849). The dep updates can stay.

Unfortunately opening a debug session to check for the presence of sources is prohibitively expensive in terms of memory (going from ~450MB to ~3.7GB in a Python process that calls Object.features on an example 1.3GB file). In order to support this we'll have to find a more parsimonious way to check if a file contains sources.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support DWARFv5 source code embedding extension
3 participants