Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Process-stealing dead lock #52

Open
leostera opened this issue Jan 30, 2024 · 0 comments
Open

Process-stealing dead lock #52

leostera opened this issue Jan 30, 2024 · 0 comments
Labels
bug Something isn't working

Comments

@leostera
Copy link
Collaborator

When running on a large number of cores, the current process stealing starts dead-locking schedulers and shows a few other bugs:

  • a process gets queued up in several schedulers, which is likely a bug in the Proc_queue or Proc_set, and once its terminated in one scheduler, the next scheduler that tries to run it will fail because finalized processes should never be put on a queue.

  • when moving timers around sometimes a timer will get triggered on a scheduler before its moved out of it – moving timers to the IO scheduler helps, and can improve the reliability of the timers since the polling workload has a strict deadline, but also means reworking the timeouts for receives and syscalls.

I've been unable to fix with additional safeguards (like more restrictive locking of the process queue), but I have identified that the Proc_set is not working as intended (likely due to the use of Atomics instead of a lock).

In the meantime main has disabled process-stealing until we figure out next steps here.

This is a good time to step back and maybe rewrite the scheduler into more module pieces that can be easier to reason about and test.

@leostera leostera added the bug Something isn't working label Jan 30, 2024
@leostera leostera added this to the riot/phase-1 milestone Mar 2, 2024
@leostera leostera moved this to Done in Riot Roadmap Mar 18, 2024
@leostera leostera removed the status in Riot Roadmap Mar 18, 2024
@leostera leostera removed this from the riot/phase-1 milestone Apr 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant