-
Notifications
You must be signed in to change notification settings - Fork 89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix multi-alarm overlapping bug #466
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an interesting bug and perhaps explains some of the edge case bugs we've seen with the OpenThread library.
I'm curious to see what the root cause ends up being.
This is the kind of test we should be running frequently on Treadmill. Highly important to get right, easy enough to have some expected results (although not exact results every time, so we'll have to pattern match in some way), and historically easy to break accidentally. |
Good catch, and the test seems fine to me. However, this does raise a more general issue to me, and one that relates to our releases, automated tests, and the timer subsystem in particular: how to assert whether the test ran correctly, or not. In particular, for timer tests we often just produce some raw, unprocessed output. When it comes to running these tests as part of a release (esp. with our release cadence), we then need to re-learn what it means for these tests to pass or fail. Similarly, with automated tests, esp. with non-deterministic output, it can be hard to encode a check function that determines the test result. We may want to take this opportunity to figure out what we can add to the user space tests and/or a test framework to make this easier. |
FWIW, @charles37 is working on an automated test for this, to be integrated into Treadmill. |
Fixes #466 When inserting an alarm into the virtual alarm queue, comparing two alarms needs to account for alarms that may wrap beyond the clock overflow. Simply comparing computed expirations against the new alarm's reference isn't sufficient, as the current alarm may overflow, resulting in an expiration that is earlies by absolute value, but should still expire later.
A significant update that adds another test (for clock overflow) and (I believe) fixes the underlying issue. See new PR description. |
An intentionally simple test for multiple alarms that only sets exactly two repeating alarms that should overlap frequently (one is twice the other's frequency).
Co-authored-by: Pat Pannuto <[email protected]>
Fixes #466 When inserting an alarm into the virtual alarm queue, comparing two alarms needs to account for alarms that may wrap beyond the clock overflow. Simply comparing computed expirations against the new alarm's reference isn't sufficient, as the current alarm may overflow, resulting in an expiration that is earlies by absolute value, but should still expire later.
@brghena @ppannuto a nudge that this is actually quite important to fix and worth reviewing/merging ASAP. @tyler-potyondy potentially relevant to the issues you've been having with OpenThread |
I will look over this tomorrow. Thanks for the nudge. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic for the timer placement seems to check out and is an improvement over the somewhat opaque within range function. I have one small comment for improving the readability of the logic, but overall the changes look good to me.
libtock/services/alarm.c
Outdated
// This alarm happens after the new alarm if: | ||
// - neither expirations overflow and this expiration value is larger than the new expiration | ||
// - both overflow and this expiration value is larger than the new expiration | ||
// - or, this alarm overflows but the new one doesn't | ||
// | ||
// If the new alarm overflows and this alarm doesn't, this alarm | ||
// happens _before_ the new alarm. | ||
if (!(!cur_overflows && new_overflows) && ((cur_overflows && !new_overflows) || cur_expiration > new_expiration)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The double negations and nested operations make this somewhat tricky to parse. Perhaps altering the conditional's logic to match the 4 statements enumerated in the comment more closely would improve readability. Something to the effect of:
( case1 || case2 || case3 || case4)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I strongly agree. I really can't parse line 110.
Based on my reading of the comment, the logic would be if ((cur_overflows && !new_overflows) || (cur_expiration > new_expiration))
. I don't understand why the first part of the condition is there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I couldn't quite help myself from being a tiny bit clever still, but I believe the new version of both the comment and boolean logic are clear now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good catch and the fix looks good to me. I have a bunch of small things to clean up though.
libtock/services/alarm.c
Outdated
// expires. | ||
bool cur_overflows = (*cur)->reference > UINT32_MAX - (*cur)->dt; | ||
|
||
// This alarm happens after the new alarm if: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is "this alarm" and what is "the new alarm"? I think it's cur
and alarm
respectively? But I might have that backwards. In my opinion, this entire comment block should use the actual names for clarity.
libtock/services/alarm.c
Outdated
// This alarm happens after the new alarm if: | ||
// - neither expirations overflow and this expiration value is larger than the new expiration | ||
// - both overflow and this expiration value is larger than the new expiration | ||
// - or, this alarm overflows but the new one doesn't | ||
// | ||
// If the new alarm overflows and this alarm doesn't, this alarm | ||
// happens _before_ the new alarm. | ||
if (!(!cur_overflows && new_overflows) && ((cur_overflows && !new_overflows) || cur_expiration > new_expiration)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I strongly agree. I really can't parse line 110.
Based on my reading of the comment, the logic would be if ((cur_overflows && !new_overflows) || (cur_expiration > new_expiration))
. I don't understand why the first part of the condition is there.
# Test Multiple Alarms (With Overflow) | ||
|
||
This tests the virtual alarms available to userspace. It sets two | ||
alarms, first one that overflows the alarm, such that it's expiration |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
alarms, first one that overflows the alarm, such that it's expiration | |
alarms, first one that overflows the alarm, such that its expiration |
|
||
This tests the virtual alarms available to userspace. It sets two | ||
alarms, first one that overflows the alarm, such that it's expiration | ||
is small in absolute value (but should shouldn't fire until after the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should or shouldn't? Pick one please
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review of everything except timer logic so far sorry
Co-authored-by: Pat Pannuto <[email protected]> Co-authored-by: Branden Ghena <[email protected]>
@ppannuto @tyler-potyondy @brghena all comments addressed |
702b4c1
This PR fixes a subtle bug in the userspace virtual alarm library. If alarms are created and expire pretty close to each other, the
within_range
check can incorrectly return the relative relationship between the two, place the earlier alarm after the later alarm, which is incorrect, and can end up totally starving the earlier alarm if the later alarm is repeating.This bug was introduced when trying to correctly account for potentially overflowing alarms, and the "obvious" fix (just compare absolute expiration values) doesn't work.
This PR adds two tests, one that demonstrates the starved alarm issue and one that exercises the overflow issue, and fixes the bug in such a way that both pass. In particular, the fix is simply to explicitly check if alarms overflow when insert them to the virtual alarm queue and use that for the comparison between alarms.
Test 1: Multi-Alarm Simple
This is a simple test for multiple alarms that only sets exactly two repeating alarms that should overlap frequently (one is twice the other's frequency). It sets two repeating alarms, at 500ms and 1s respectively, each prints the alarm index (1 or 2), current tick and alarm expiration to the console. When successful, both alarms should fire and alarm 1 should fire twice as often as alarm 2.
Today (on the NRF52840 at least, though I don't expect other chips to be different), this test fails. It should output (the first column in each line is the timer index):
but, due to a bug in the alarm library, actually outputs:
(note timer
1
never appears after the first instance)Test 2: Multi-Alarm Overflow
This is a for multiple alarms where one overflows the clock (
UINT32_MAX
ticks), resulting in a low absolute value expiration, while the other is "normal" (1 second). When successful, both alarms should fire, the second after about 1 second, and the first after the clock overflows (around 7 minutes on 32kHz clocks).When the logic for inserting clocks doesn't account correctly for overflows, the second alarm fires after the first one, both after the clock overflows.