Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Partitions can deadlock #121

Open
sevenautumns opened this issue Jun 20, 2024 · 1 comment
Open

Partitions can deadlock #121

sevenautumns opened this issue Jun 20, 2024 · 1 comment
Labels
ARINC 653 Part 4 bug Something isn't working

Comments

@sevenautumns
Copy link
Collaborator

Since f50997f partitions can deadlock.
This happens when the aperodic process tries to send something via the "syscall" UnixDatagram to the hypervisor, but the hypervisor freezes the partition in this very moment. When the next partition time window comes around, the periodic process is scheduled first (unfrozen) but it never computes. We think, this is because the aperiodic process is frozen during a critical section of the send, which locks the entire process. Since f50997f processes within a partition are actually threads of a single process. This would explain, why a freeze during a critical section in the aperiodic process could lock the perodic process out of executing. This results in a deadlock, because the aperiodic process is only scheduled after the periodic process finishes its work in this partition time window, and since the periodic process can not execute, this never happens.

This class of errors/deadlocks can be avoided by moving intra-partition scheduling into each partition

@sevenautumns sevenautumns added bug Something isn't working ARINC 653 Part 4 labels Jun 20, 2024
@florianhartung
Copy link
Collaborator

florianhartung commented Aug 8, 2024

Here is one possible solution:
We can spawn a new thread for each partition on partition start. Let's call this thread the Manager Thread (MT).
The MT will then perform all critical operations on behalf of the partition's processes.
Also the MT should always run in the background, whenever a process from its partition is running.

When a process (which runs in another thread, but in the same address space as the MT) encounters an operation, that could cause a deadlock right now, the process instead invokes the manager thread to execute said operation. It does this by sending a closure through an mpsc::channel to the manager thread along with another channel, which is used for receiving the return value.

The logic inside the manager thread would be pretty straightforward, as it can just do a blocking receive call on the channel.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ARINC 653 Part 4 bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants