-
Notifications
You must be signed in to change notification settings - Fork 127
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
depthai-core deadlocked in semaphores with OAK-D-Pro-PoE after 100+ connections, #1105
Comments
I have isolated and fixed a group of XLink bugs within its Windows implementation for semaphores, pthread conditions, and clocks. After applying multiple fixes, OAK PoE failures declined by a magnitude 🌠 A test run with fixes was able to make 2897 connections in 6.5 hours before failure. At the point of failure, VSCode itself failed and therefore I did not have access to the debugger. I am unclear if VSCode failed and killed the test process, or if the test process died and affected VSCode. Still, the test wrote a CSV log and I see its results...2897 successful test runs in 6.5 hours. The OAK-D-Pro-PoE in the test has the recent bootloader 0.0.28 from https://github.com/luxonis/depthai-core/releases/tag/v2.26.0. Applying this firmware alone did not have any measureable affect in connection reliability. The magnitude improvement was due to Xlink bug fixes. There may still be an OAK firmware/bootloader problem. The OAK-D-Pro-PoE after the test failure did not pass spot testing, even after it having no client communicating to it for 2 hours.
I repeat that "may not" because the |
That's great to hear @diablodale! Would you be willing to open a PR to |
No PR. Same answer as in March
Fixes are passing my reliability tests. Last test ran 4126 iterations of continuous connect, get data, disconnect, repeat with an OAK-d-pro-poe. Zero delays, errors, faults, or freezes. All data streams valid. The sensor also continued correctly with manual testing after this 4k run with a few casual tests.
This issue should give your team enough info to look and fix your code. |
See #415
When that repro case is run with depthai-core v2.27.0 for hundreds of connects, the app will eventually hang and not respond.
This is an improvement from #415 when the test failed with only 6 connects.
One test deadlocked in 135 connects. Another test it took 402 connects to deadlock.
The OAK-D-Pro-PoE responds to pings. The device itself could be ok.
The problem appears to be a deadlock in XLink semaphores. By running the test in a debugger I can see 6+ threads and they are all infinitely waiting on XLink semaphores in
sem_wait()
. Something is not signaling them.The text was updated successfully, but these errors were encountered: