Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"esel" tool not available #174

Open
madscientist159 opened this issue Apr 22, 2019 · 6 comments
Open

"esel" tool not available #174

madscientist159 opened this issue Apr 22, 2019 · 6 comments

Comments

@madscientist159
Copy link

For the past couple of years debugging hostboot faults has been made unnecessary hard for OEMs due to the errl tool not being available. This omission gives OEMs two choices:
1.) Revert to "shotgun debugging" (guess, modify code, insert debug printf()s, rebuild, test, repeat) -- very slow and expensive
2.) Put IBM engineers in the critical path for debugging crashes -- again, relatively slow

We need some way of analysing HBEL dumps to get origin source line numbers.

@dcrowell77
Copy link
Collaborator

The most important parts of the error log (failing module, return code) are output to the console. Note that these are deliberately not line numbers which makes them mostly build independent. So for the vast majority of failures that gets you to the exact point of the failure without any extra tooling, just the SOL console.

The big exception here is crashes (i.e. segfaults), and those are problematic even with the full esels. The printk output is part of the log, and since it is plain ascii it should be pretty obvious to read in even a raw (unparsed) log from the BMC. With the printk and the build artifacts you can usually walk the backtrace of the failure. Even internally we don't have any data beyond that for Hostboot crashes.

However, your point is valid that having the error log parser would be helpful. There is a project out there to externalize that - https://github.com/open-power/errl . Unfortunately the person behind this work left us awhile back so I think the momentum may have slowed a bit... I'll try to figure out who has the ball now to get this fully integrated into op-build.

@madscientist159
Copy link
Author

Understood. As you mentioned, this is mainly useful in the context of crashes, which I agree with -- we only really needed this tool when part of hostboot was crashing. I've started some initial documentation on how to parse the records without errl here https://wiki.raptorcs.com/wiki/Hostboot_Debug_Howto but as you can see it's a labor intensive process and we're throwing away a lot of data that may or may not be incidentally helpful in the process.

@dcrowell77
Copy link
Collaborator

The esel/errl parser doesn't provide a huge amount of value for crashes. You'll get things in a slightly more readable format, but the only useful content is pretty much the printk with the backtrace that you have to manually decode. That is what we do internally as well.

@dcrowell77
Copy link
Collaborator

@sampmisr is now driving the errl work.

@dcrowell77 dcrowell77 assigned dcrowell77 and unassigned dcrowell77 Apr 23, 2019
@madscientist159
Copy link
Author

You'll get things in a slightly more readable format, but the only useful content is pretty much the printk with the backtrace that you have to manually decode. That is what we do internally as well.

Good to know. We might work on tooling to make this process easier.

@artemsen
Copy link
Contributor

We had the same problem, so wrote own errl-like utility for decode HBEL:
https://github.com/YADRO-KNS/openpower-esel-parser

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants