-
Notifications
You must be signed in to change notification settings - Fork 96
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Block until mutex can be locked #318
Conversation
This is 100% my fault (or maybe 50-50 since @MabezDev didn't notice my mistake :) ) as the change I introduced in #276 is absolutely, completely broken. It's also very probably the cause of all my issues in #315 - and the intermittent failure experienced during testing, maybe? BUT! I am not sure if the original code was working as intended either, and that my previous transformation was wrong in the sense that I did something logically different. I still see no code path in the old code where |
Haha, I managed to run into this blocking forever 😭 But it made me realize that the software interrupt that would switch the tasks is masked for some reason. Code is now looping in |
Seems you are right and while the loop definitely shows the intent to loop there until the mutex is available in reality the old code never got there The thing about yield is a bit weird since it used to work - I think there were some changed to interrupt-on / off or something which might cause that now. We have more places where we call yield 🤔 I will compare the behavior to some older revision to see if the Software-Interrupt gets handled there correctly. I assume you are using S3, right? |
I'm using the S3, yes. I don't think the yield mechanism is incorrect, nothing would work if that was the case. I've traced task switching yesterday and it was cycling between the three correctly. What I'm thinking is that maybe we're not unmasking some interrupts correctly in one of the functions we provide to the driver. I remember messing with those, I just don't remember if my changes made it in or not (i.e. if this is something I've caused). I'm planning to spend my time on this today, though so you don't have to if you have other stuff. It's very annoying to debug because, as with most concurrency issues, it's only occurring rarely. |
Yeah, probably doesn't make sense to look into this in parallel. Won't be fun definitely |
03ebc5d
to
6a76d6f
Compare
145bad5
to
b749a5f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yield_task
being blocked by a CS is quite the foot gun we'll have to watch out for that :D. I agree that we should avoid CS with our "freertos" primitives where possible to avoid these situations in the future.
LGTM
In esp-idf, the corresponding function is implemented as:
This function suspends the task until the mutex can be locked. This PR attempts to replicate this behaviour.
It is common to ignore the return value of such mutex locks, as they are assumed to always succeed. In these cases the old code may have entered regions from multiple places simultaneously.
The second commit moves some code out of a critical section to prevent infinitely looping when the mutex can't be locked immediately.