Block until mutex can be locked #318

bugadani · 2023-10-29T16:46:31Z

In esp-idf, the corresponding function is implemented as:

static int32_t IRAM_ATTR mutex_lock_wrapper(void *mutex)
{
    return (int32_t)xSemaphoreTakeRecursive(mutex, portMAX_DELAY);
}

This function suspends the task until the mutex can be locked. This PR attempts to replicate this behaviour.

It is common to ignore the return value of such mutex locks, as they are assumed to always succeed. In these cases the old code may have entered regions from multiple places simultaneously.

The second commit moves some code out of a critical section to prevent infinitely looping when the mutex can't be locked immediately.

bugadani · 2023-10-29T17:15:42Z

This is 100% my fault (or maybe 50-50 since @MabezDev didn't notice my mistake :) ) as the change I introduced in #276 is absolutely, completely broken. It's also very probably the cause of all my issues in #315 - and the intermittent failure experienced during testing, maybe?

BUT! I am not sure if the original code was working as intended either, and that my previous transformation was wrong in the sense that I did something logically different. I still see no code path in the old code where lock_mutex would have looped. Unless there's some subtlety I'm skipping over, the branch (false, false) should be unreachable. Why am I wrong?

bugadani · 2023-10-30T07:03:59Z

Haha, I managed to run into this blocking forever 😭 But it made me realize that the software interrupt that would switch the tasks is masked for some reason. Code is now looping in lock_mutex, yield_task fires the interrupt request that never gets handled, which obviously doesn't let the code make any progress.

bjoernQ · 2023-10-30T07:13:49Z

Seems you are right and while the loop definitely shows the intent to loop there until the mutex is available in reality the old code never got there

The thing about yield is a bit weird since it used to work - I think there were some changed to interrupt-on / off or something which might cause that now. We have more places where we call yield 🤔

I will compare the behavior to some older revision to see if the Software-Interrupt gets handled there correctly. I assume you are using S3, right?

bugadani · 2023-10-30T07:20:18Z

I'm using the S3, yes. I don't think the yield mechanism is incorrect, nothing would work if that was the case. I've traced task switching yesterday and it was cycling between the three correctly.

What I'm thinking is that maybe we're not unmasking some interrupts correctly in one of the functions we provide to the driver. I remember messing with those, I just don't remember if my changes made it in or not (i.e. if this is something I've caused). I'm planning to spend my time on this today, though so you don't have to if you have other stuff.

It's very annoying to debug because, as with most concurrency issues, it's only occurring rarely.

bjoernQ · 2023-10-30T08:26:20Z

Yeah, probably doesn't make sense to look into this in parallel. Won't be fun definitely

bugadani · 2023-10-30T09:23:37Z

This is the call trace at the place of the infinite loop:

I hope we don't have more of these but I don't know. If a freeze happens again, we can connect a debugger, read the stack trace and update accordingly.

esp-wifi/src/wifi/mod.rs

MabezDev

yield_task being blocked by a CS is quite the foot gun we'll have to watch out for that :D. I agree that we should avoid CS with our "freertos" primitives where possible to avoid these situations in the future.

LGTM

Block until mutex is locked

9567b5d

bugadani changed the title ~~Block until mutex is locked~~ Block until mutex can be locked Oct 29, 2023

bugadani force-pushed the mutex branch 2 times, most recently from 03ebc5d to 6a76d6f Compare October 30, 2023 09:46

bugadani commented Oct 30, 2023

View reviewed changes

esp-wifi/src/wifi/mod.rs Show resolved Hide resolved

MabezDev reviewed Oct 30, 2023

View reviewed changes

esp-wifi/src/wifi/mod.rs Outdated Show resolved Hide resolved

bugadani force-pushed the mutex branch 2 times, most recently from 145bad5 to b749a5f Compare October 30, 2023 13:07

Don't drop EspWifiPacketBuffer in a critical section

da8679f

bugadani force-pushed the mutex branch from b749a5f to da8679f Compare October 30, 2023 13:07

MabezDev approved these changes Oct 30, 2023

View reviewed changes

MabezDev merged commit d6bc265 into esp-rs:main Oct 30, 2023
7 checks passed

bugadani deleted the mutex branch October 30, 2023 14:37

bugadani mentioned this pull request Oct 31, 2023

Losing connection while downloading large file may result in unrecoverable state #315

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Block until mutex can be locked #318

Block until mutex can be locked #318

bugadani commented Oct 29, 2023 •

edited

Loading

bugadani commented Oct 29, 2023 •

edited

Loading

bugadani commented Oct 30, 2023 •

edited

Loading

bjoernQ commented Oct 30, 2023

bugadani commented Oct 30, 2023 •

edited

Loading

bjoernQ commented Oct 30, 2023

bugadani commented Oct 30, 2023

MabezDev left a comment

Block until mutex can be locked #318

Block until mutex can be locked #318

Conversation

bugadani commented Oct 29, 2023 • edited Loading

bugadani commented Oct 29, 2023 • edited Loading

bugadani commented Oct 30, 2023 • edited Loading

bjoernQ commented Oct 30, 2023

bugadani commented Oct 30, 2023 • edited Loading

bjoernQ commented Oct 30, 2023

bugadani commented Oct 30, 2023

MabezDev left a comment

Choose a reason for hiding this comment

bugadani commented Oct 29, 2023 •

edited

Loading

bugadani commented Oct 29, 2023 •

edited

Loading

bugadani commented Oct 30, 2023 •

edited

Loading

bugadani commented Oct 30, 2023 •

edited

Loading