-
Notifications
You must be signed in to change notification settings - Fork 31
comgr lacks MT safety #27
Comments
As an update on this, the first step will be to define the MT safety of all non-trivially-MT-safe APIs in Comgr to be MT-unsafe. Then future work to give APIs the most meaningful/useful safety guarantees can be backwards-compatible. For now, assume all Comgr APIs are de-facto MT-unsafe. |
@scott-linder I heard that there are some updates wrt this ticket, Can you please shed some light on this? |
CC @lamb-j |
CC @averinevg |
We recently added two patches to Comgr related to this: Currently we're enforcing thread safety by enclosing the LLVM calls that have unexpected side effects with mutex locks. As more updates are made to upstream LLVM related to these unexpected side effects, we hope to remove these locks to allow more parallelism in a multi-threaded Comgr context. However there have been a couple of reports of potential bugs with the new Comgr MT approach. I haven't had a chance to fully investigate these yet. Could you retest MIOpen with a newer Comgr including the above patches? |
Just in case, have you taken into account that compilation is often a series of COMgr actions? For example, if you build two HIP sources in parallel, then there are two series of actions:
Without proper serialization these two series or actions will likely run in parallel, which may lead to unexpected side effects. In MIOpen we can use mutex at higher level (and we did that), which guarantees proper serialization of COMgr calls. |
@atamazov Good question. If I understand correctly, you're asking about a possible situation like this: Thread 1 --> AMD_COMGR_ACTION_COMPILE_SOURCE_TO_BC Will the interleaving of the two thread's actions have unexpected side effects? Short answer: They shouldn't. Long answer: The locks are added at the action level. Comgr doesn't maintain any sort of global state for completed actions, and the LLVM global state is cleared for each action (see COMGR::clearLLVMOptions()). Therefore each action should be completely independent, and it shouldn't make any difference how they're interleaved. |
@lamb-j Good to know that, Jacob. Thanks. |
@atamazov Closing this for now. Feel free to re-open if you have any other multi-threading issues with Comgr! Hopefully we can move to actual parallel compilation with Comgr in the future (that is, remove the Comgr locks around actions without causing unexpected side effects). |
Basically, this is about actual parallel compilation, because we need it for performance. I recommend re-opening this ticket (I can't). |
Good point. We can track the performance/parallel issue here: https://github.com/RadeonOpenCompute/ROCm-CompilerSupport/issues/57 |
It seems like comgr is not working properly in multithreaded contexts. This prevents it from full-blown usage in MIOpen.
How to reproduce
Prerequisites: A linux machine with gfx900/906/908 GPU card installed (Radeon VII preferred), vanilla ROCm 3.3.
cmake -DBUILD_DEV=On...
(see here for detailed instruction), then makeMIOpenDriver -j
. Do not install.MIOPEN_LOG_LEVEL=5
and make sure that the following lines present on the console:These upper two lines tell us that two kernels (MIOpenCDFGen4 and MIOpenConvUni) obtained from two Solutions (ConvOclDirectFwdGen and ConvOclDirectFwd) were built and run by the library. The third line says that ConvOclDirectFwdGen (the fastest) was selected. The library builds the kernels in parallel; you may wish to export
MIOPEN_ENABLE_LOGGING_MPMT=1
andMIOPEN_LOG_LEVEL=6
and see these details in the console log.-DMIOPEN_USE_COMGR=On
to cmake command line (see here for details) and rebuild library and driver,🔴 There are two problems:
You can disable MT builds in MIOpen by the following env setting:
and see that driver works normally in this case.
The text was updated successfully, but these errors were encountered: