-
Notifications
You must be signed in to change notification settings - Fork 337
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
target/riscv: Ensure to handle all triggered a halt events #1171
base: riscv
Are you sure you want to change the base?
Conversation
7652128
to
21d836a
Compare
else | ||
break; | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This loop seems weird. What is it's purpose? DCSR is writable, so we can occasionally trick debugger into wrong conclusion.
halt groups are an optional feature and I'm quite confused that we don't check for it.
Could you please provide a test scenario to reproduce your issue? Is it possible to use spike to model it? Or do you need a specific HW ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This bug was discovered when I was testing Semihosting on our hardware and smp
is enable. It fails when the following happens:
If all current halted states are due to a halt group, and other harts state was running. In fact, there was a hart halted, which caused the other harts to halt because of the hart group.
riscv-openocd/src/target/riscv/riscv.c
Lines 3792 to 3796 in f51900b
} else if (halted && running) { | |
LOG_TARGET_DEBUG(target, "halt all; halted=%d", | |
halted); | |
riscv_halt(target); | |
} else { |
If there is such a halted hart,but the record status is running,it would not process
riscv_semihosting
.riscv-openocd/src/target/riscv/riscv.c
Lines 3605 to 3623 in f51900b
if (halt_reason == RISCV_HALT_EBREAK) { | |
int retval; | |
/* Detect if this EBREAK is a semihosting request. If so, handle it. */ | |
switch (riscv_semihosting(target, &retval)) { | |
case SEMIHOSTING_NONE: | |
break; | |
case SEMIHOSTING_WAITING: | |
/* This hart should remain halted. */ | |
*next_action = RPH_REMAIN_HALTED; | |
break; | |
case SEMIHOSTING_HANDLED: | |
/* This hart should be resumed, along with any other | |
* harts that halted due to haltgroups. */ | |
*next_action = RPH_RESUME; | |
return ERROR_OK; | |
case SEMIHOSTING_ERROR: | |
return retval; | |
} | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aap-sc I think this is a bug, would you provide some suggestions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lz-bro I'm still trying to understand your reasoning and what the issue is exactly (the situation is still not quite obvious to me). It will take a couple of days - I'll ask additional question if necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe I have figured out the code thanks to the detailed description that @zqb-all has provided:
If the harts are just in the process of halting due to being members of the halt group, we should wait until they finish halting, so that the true halt reason can be discovered - for instance semihosting request - and then handled correctly.
@lz-bro I am afraid I have not understood what case this merge request addresses; not even after reading the commit description and the discussion so far. Please, could you provide a very clear description in the commit message. Doing so will help:
Thank you. |
Let me try to explain this issue. When openocd calls A. if no hardware state change occurs,sequence is: B. If core0 hit soft breakpoints on hardware, one possible sequence is Let's re-assume that core0/core1 are both running and consider case C C. If core0 hit semihosting ebeak on hardware, one possible sequence is: Let's re-assume that core0/core1 are both running and consider case D D. If core0 hit semihosting ebeak on the hardware, but the timing was earlier than in case C, one possible sequence is: Thank you. |
Things are a bit complicated. |
@JanMatCodasip @aap-sc Does my description of the issue help you understand what the issue is ? |
@zqb-all Thank you for describing the situation in more detail. It will take me some time to get back to it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @zqb-all, thank you for the clear and accurate description of the situation that you're trying to address. It helped me to understand the code change. I am also sorry for replying late.
Overall this fix looks good and I have posted some comments.
@@ -3790,6 +3791,22 @@ int riscv_openocd_poll(struct target *target) | |||
LOG_TARGET_DEBUG(target, "resume all"); | |||
riscv_resume(target, true, 0, 0, 0, false); | |||
} else if (halted && running) { | |||
foreach_smp_target(entry, targets) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
foreach_smp_target(entry, targets) | |
LOG_TARGET_DEBUG(target, "SMP group is in inconsistent state: %u halted, %u running", halted, running); | |
/* The SMP group is in an inconsistent state - some harts in the group have halted | |
* whereas others are running. The reasons for that (and corresponding | |
* OpenOCD actions) could be: | |
* 1) The targets are in the process of halting due to halt groups | |
* but not all of them halted --> wait a moment and then poll again so that | |
* the halt reason of every hart can be accurately determined (e.g. semihosting). | |
* 2) The targets do not support halt groups --> OpenOCD must halt | |
* the remaining harts by a standard halt request. | |
* 3) The hart states got out of sync for some other unknown reason (problem?). --> | |
* Same as previous - try to halt the harts by a standard halt request | |
* to get them back in sync. | |
/* Detect if the harts are just in the process of halting due to a halt group */ | |
foreach_smp_target(entry, targets) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed
if (get_field(dcsr, CSR_DCSR_CAUSE) == CSR_DCSR_CAUSE_GROUP) | ||
cause_groups++; | ||
else | ||
break; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
break; | |
/* This hart has halted due to something else than a halt group. | |
* Don't continue checking the rest - exit early. */ | |
break; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed
src/target/riscv/riscv.c
Outdated
if (halted == cause_groups) | ||
return ERROR_OK; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (halted == cause_groups) | |
return ERROR_OK; | |
if (halted == cause_groups) { | |
LOG_TARGET_DEBUG(target, "The harts appear to just in the process of halting due to halt group. Giving them more time - will poll their state later."); | |
return ERROR_OK; | |
} |
@@ -3790,6 +3791,22 @@ int riscv_openocd_poll(struct target *target) | |||
LOG_TARGET_DEBUG(target, "resume all"); | |||
riscv_resume(target, true, 0, 0, 0, false); | |||
} else if (halted && running) { | |||
foreach_smp_target(entry, targets) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if this loop can be skipped if haltgroup_supported
is equal to false.
LOG_TARGET_DEBUG(target, "halt all; halted=%d", | ||
halted); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LOG_TARGET_DEBUG(target, "halt all; halted=%d", | |
halted); | |
/* Halting the whole SMP group to bring it in sync. */ | |
LOG_TARGET_DEBUG(target, "halt all; halted=%d", | |
halted); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed
else | ||
break; | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe I have figured out the code thanks to the detailed description that @zqb-all has provided:
If the harts are just in the process of halting due to being members of the halt group, we should wait until they finish halting, so that the true halt reason can be discovered - for instance semihosting request - and then handled correctly.
src/target/riscv/riscv.c
Outdated
if (halted == cause_groups) | ||
return ERROR_OK; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(1) I would recommend to keep a counter of how many times this situation has occurred. If it reaches a certain limit, OpenOCD should:
- stop waiting for the harts to halt,
- print an error to the user, and
- try to halt all the harts directly (riscv_halt()).
(2) Instead of returning from riscv_openocd_poll()
and waiting for the next poll interval, it would perhaps be better to create a loop inside this function. This would allow to immediately re-poll the hart state and react quickly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
create a loop inside this function. This would allow to immediately re-poll the hart state and react quickly.
Sounds good, this helps smp harts finish state adjustment faster.
252e9cf
to
8193431
Compare
If the harts are just in the process of halting due to a halt group, poll again so that the halt reason of every hart can be accurately determined (e.g. semihosting).
8193431
to
0affedb
Compare
If all current halted states are due to a halt group, then a new "triggered a halt" event has occurred.