-
Notifications
You must be signed in to change notification settings - Fork 122
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Vanadis exits with "Assertion `isNaN_boxed( src_1 )' failed." #2317
Comments
I have looked at the ROB (see below) and instruction trace when this assertion failure occurs. The floating point instruction that is causing the assertion failure is one near the back of the ROB, at address 0x5f8e0. What confuses me is that this instruction is in the executable at an address after the final instruction (a return instruction) of another function. That is, that return instruction (JR at 0x5f8da) is in function X, and 0x5f8e0 is in function Y. Because of this, I think this instruction is a throw-away instruction (because once the return instruction is eventually executed, the pipeline should recognize that the instruction at 0x5f8e0 should not have been executed at all). Strictly speaking, I don't see any problem with this situation - from the perspective of functional correctness - up to this point. However, the problem I see is that this assertion is failing based on the contents of the floating point registers used by this instruction (this assertion can be found here). Since this instruction is a throw-away instruction within a function that will not be reached in the actual control flow, I don't see how there can be any guarantee about what its source registers contain. Is there any reason not to believe the source registers may contain random garbage? If not, I don't see how we can assert anything about their contents. But, can someone please correct me if I'm wrong about this and thinking about this situation incorrectly?
|
Can this be dealt using the trap flag instead of assertions? That way, if the offending instruction is eventually thrown away by pipeline clear before it reaches retire stage, the trap error would be forgotten enabling execution to continue from the new path. |
New Issue for sst-elements
1 - Detailed description of problem or enhancement
Vanadis exits with FATAL.
2 - Describe how to reproduce
Note that I've run this program natively on a RISCV machine, and it ran successfully to completion.
Download attached dgemm executable (dgemm.zip). From
src/sst/elements/vanadis/tests/
, run(I zipped the executable, because GitHub wouldn't let me upload the file with no extension or with an "exe" extension.)
3 - What Operating system(s) and versions
Rocky Linux 8.9
4 - What version of external libraries (Boost, MPI)
5 - Provide sha1 of all relevant sst repositories (sst-core, sst-elements, etc)
sst-core e952a81bc
sst-elements 7e67f8f
6 - Fill out Labels, Milestones, and Assignee fields as best possible
It doesn't appear to me that GitHub will allow me to edit these fields. But maybe I am missing it.
The text was updated successfully, but these errors were encountered: