-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
runtime: distinguish two kind of mutexes #13716
base: trunk
Are you sure you want to change the base?
Conversation
(I intend to enable |
- 'runtime' mutexes are for blocking critical sections which do not access the runtime. - 'mutator' mutexes are for non-blocking critical sections (blocking on a mutex releases the runtime lock) which may access the runtime system. This refactoring comes from the discussions in ocaml#13227, it tries to avoid a class of bug where the same mutex is used in both blocking and non-blocking fashion, resulting in subtle deadlock situations.
476e163
to
b839f44
Compare
Note: I am afraid there is some overlap and potential conflicts between this PR and #13416, as the PR exploits the definition of |
Speaking as a non-expert myself, I often have to remind myself that blocking/non-blocking in this context refers to the domain lock and not the mutex itself (blocking on the mutex until it's released). What I mean is that |
I see your point, but (I suggested "cooperative" (and we could use "non_cooperative") before but @gadmm was not convinced.) Maybe (It may be that some lightweight static annotations could help detect issues where mutator functions are called unsafely; there is already a dynamic debug-time system with |
Just a quick comment without reading the code in-depth, the "non blocking" variants of I do not see what is the use for the current |
Summary
This PR distinguishes two kind of mutexes:
'runtime' mutexes are for blocking critical sections which do not access the runtime.
'mutator' mutexes are for non-blocking critical sections (blocking on a mutex releases the runtime lock) which may access the runtime system.
This refactoring comes from the discussions in #13227, it tries to avoid a class of bug where the same mutex is used in both blocking and non-blocking fashion, resulting in subtle deadlock situations.
More details
The runtime has a
caml_plat_lock_blocking
function that takes a mutex in the obvious way. This function should be used very carefully, because it in blocks a domain without transferring control to its backup thread or otherwise listening to STW interruptions, and it can easily cause deadlocks if the critical section itself contains an STW poll point. In #13063, @gadmm introduced a different mutex-taking function,caml_plat_lock_non_blocking
that releases the domain lock when it needs to block, and should be used in any critical section that could be long or needs to use the runtime system.We are still learning about what's a correct usage discipline for these two functions (on Monday I temporarily introduced a bug in trunk, detected reported on Tuesday morning by @jmid in #13713 and fixed on Tuesday evening by @gadmm in #13714 ). In #13714 we realized that it is incorrect to mix uses of
lock_blocking
andlock_non_blocking
on the same mutex -- except in very specific use-cases that are not currently used in the runtime. The current PR proposes to separate the two APIs so that there is no risk to make this mistake again in the future. The goal is to have a system that is simpler to reason about and to use correctly for non-experts such as myself.