Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parsing DWARF in .o files #564

Open
sevaa opened this issue Jul 14, 2024 · 0 comments
Open

Parsing DWARF in .o files #564

sevaa opened this issue Jul 14, 2024 · 0 comments

Comments

@sevaa
Copy link
Contributor

sevaa commented Jul 14, 2024

.o files are superficially ELF (with e_type set to ET_REL), but unlike executables and shared libraries, they are allowed to contain multiple sections with the same name. With that in mind, the normal pyelftools' logic of pulling DWARF by finding the section with a given name breaks down.

Also, trying to load them with relocations enabled gives errors aplenty. Also because there can be multiple relocation sections with a given name and an arbitrary order; the first .rel.debug_info in a file doesn't have to correspond to the first .debug_info. In object files, sh_info of a rel/rela section is expected to contain the index of the section it applies to. In linked binaries, this doesn't hold.


I'm looking at pletoh.o within stm32wb_zigbee_wb_lib.a as downloaded from here. That file contains:

  • 16 .debug_info sections
  • 16 .debug_line sections
  • 15 .debug_loc sections
  • 15 .debug_aranges sections
  • 1 .debug_abbrev section
  • 15 -debug_pubnames sections
  • 16 .debug_frame sections

Regarding the way they are interlinked:

  • links from info to abbrev work as always, since there is only one of the latter and it's by code, not by offset
  • refs from info to line are uniformly zero. The way the sections are arranged, each info section is preceded by a line one; one assumes that the zero offset references the lineprog in the corresponding line section
  • refs from info to loc also look like they are relative to the corresponding section - the values of DW_AT_location in each info section seem to start from 0. There is no obvious way to map info sections to loc sections though - the loc sections are all clustered near the end of the file. The order is the only thing to go by, and even that has to take the gaps into account. Llvm-dwarfdump seems to resolve it alright, but I didn't go through the trouble of a manual check to see that it dumps the correct loclists.

When it comes to dumping, readelf observes the section boundaries. So, for example, when dumping info, it goes:

Contents of the .debug_info section:
Compilation Unit @ offset 0:
...the rest of the section dump, DIEs, attributes, etc.
Contents of the .debug_info section:
Compilation Unit @ offset 0:
...and then the stuff from the second section

for as many sections as there are. The section header doesn't indicate neither the offset nor the index of the section - its order in the binary is the only thing to go by.

This can be theoretically accommodated in the API by concatenating the info sections for parsing, but keeping section index somewhere in the CU data structure.


That all said, I don't know how prominent this file structure is in the grand scheme of things. Does GCC or LLVM emit objects like that? This is all the output of ARM's IAR compiler.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant