comgr lacks MT safety #27

atamazov · 2020-05-01T12:35:51Z

It seems like comgr is not working properly in multithreaded contexts. This prevents it from full-blown usage in MIOpen.

How to reproduce

Prerequisites: A linux machine with gfx900/906/908 GPU card installed (Radeon VII preferred), vanilla ROCm 3.3.

(1) Pull the https://github.com/ROCmSoftwarePlatform/MIOpen/tree/comgr-attach branch from MIOpen repo and checkout ROCm/MIOpen@9a3a2e7
(2) Build MIOpen library and MIOpenDriver with HIP backend. Use developer build, cmake -DBUILD_DEV=On... (see here for detailed instruction), then make MIOpenDriver -j. Do not install.
(3) Optional: modify environment to skip non-relevant kernels in MIOpen. This would speed-up execution and clean MIOpen logs from non-relevant information.

export MIOPEN_DEBUG_CONV_GEMM=0
export MIOPEN_DEBUG_CONV_FFT=0
export MIOPEN_DEBUG_CONV_IMPLICIT_GEMM=0
export MIOPEN_DEBUG_CONV_SCGEMM=0
export MIOPEN_DEBUG_CONV_WINOGRAD=0
export MIOPEN_DEBUG_CONV_DIRECT=1
export MIOPEN_DEBUG_GCN_ASM_KERNELS=0
export MIOPEN_DEBUG_OPENCL_CONVOLUTIONS=1

(4) Run the following command and make sure that it works fine:

$ ./bin/MIOpenDriver conv -x 20 -y 5 -W 700 -H 161 -c 1 -n 8 -k 32 -p 0 -q 0 -u 2 -v 2 -l 1 -j 1 -g 1 -F 1 -V 0 -s 0 -t 1 -i 1
MIOpenDriver conv -x 20 -y 5 -W 700 -H 161 -c 1 -n 8 -k 32 -p 0 -q 0 -u 2 -v 2 -l 1 -j 1 -g 1 -F 1 -V 0 -s 0 -t 1 -i 1
MIOpen Forward Conv. Algorithm: 1, Solution: 9/ConvOclDirectFwdGen
GPU Kernel Time Forward Conv. Elapsed: 0.307048 ms (average)
stats: name, n, c, ho, wo, x, y, k, flopCnt, bytesRead, bytesWritten, GFLOPs, GB/s, timeMs
stats: fwd-conv5x20u2, 8, 1, 79, 341, 5, 20, 32,  1379276800, 3619200, 27585536, 4492, 102, 0.307048

(5) Optional: re-run the same command prefixed with MIOPEN_LOG_LEVEL=5 and make sure that the following lines present on the console:

Info [EvaluateInvokers] ConvOclDirectFwdGen: MIOpenCDFGen4: 0.323367 < 3.40282e+38
Info [EvaluateInvokers] ConvOclDirectFwd: MIOpenConvUni: 1.67044 >= 0.323367
...
Info [FindConvFwdAlgorithm] FW Chosen Algorithm: ConvOclDirectFwdGen , 0, 0.323367

These upper two lines tell us that two kernels (MIOpenCDFGen4 and MIOpenConvUni) obtained from two Solutions (ConvOclDirectFwdGen and ConvOclDirectFwd) were built and run by the library. The third line says that ConvOclDirectFwdGen (the fastest) was selected. The library builds the kernels in parallel; you may wish to export MIOPEN_ENABLE_LOGGING_MPMT=1 and MIOPEN_LOG_LEVEL=6 and see these details in the console log.

(6) Now it's time to use comgr. Clean build directory (you can skip cleaning the build directory, but I am not sure if this would rebuild all the necessary components). Add -DMIOPEN_USE_COMGR=On to cmake command line (see here for details) and rebuild library and driver,
(7) Re-run the command from step (4). Output:

$ ./bin/MIOpenDriver conv -x 20 -y 5 -W 700 -H 161 -c 1 -n 8 -k 32 -p 0 -q 0 -u 2 -v 2 -l 1 -j 1 -g 1 -F 1 -V 0 -s 0 -t 1 -i 1
MIOpenDriver conv -x 20 -y 5 -W 700 -H 161 -c 1 -n 8 -k 32 -p 0 -q 0 -u 2 -v 2 -l 1 -j 1 -g 1 -F 1 -V 0 -s 0 -t 1 -i 1
clang (LLVM option parsing): for the --amdgpu-early-inline-all option: may only occur zero or one times!
clang (LLVM option parsing): for the --amdgpu-prelink option: may only occur zero or one times!
clang (LLVM option parsing): for the --amdgpu-internalize-symbols option: may only occur zero or one times!
clang (LLVM option parsing): for the --pgo-warn-misexpect option: may only occur zero or one times!

🔴 There are two problems:

Build does not succeed.
Build kills the driver. Neither error codes returned to MIOpen, nor exceptions thrown.

You can disable MT builds in MIOpen by the following env setting:

export MIOPEN_COMPILE_PARALLEL_LEVEL=1

and see that driver works normally in this case.

The text was updated successfully, but these errors were encountered:

scott-linder · 2020-05-20T16:59:16Z

As an update on this, the first step will be to define the MT safety of all non-trivially-MT-safe APIs in Comgr to be MT-unsafe. Then future work to give APIs the most meaningful/useful safety guarantees can be backwards-compatible.

For now, assume all Comgr APIs are de-facto MT-unsafe.

atamazov · 2023-03-03T16:40:58Z

@scott-linder I heard that there are some updates wrt this ticket, Can you please shed some light on this?

kzhuravl · 2023-03-03T17:14:17Z

CC @lamb-j

atamazov · 2023-03-03T17:21:07Z

CC @averinevg

lamb-j · 2023-03-03T17:28:56Z

We recently added two patches to Comgr related to this:

Currently we're enforcing thread safety by enclosing the LLVM calls that have unexpected side effects with mutex locks. As more updates are made to upstream LLVM related to these unexpected side effects, we hope to remove these locks to allow more parallelism in a multi-threaded Comgr context.

However there have been a couple of reports of potential bugs with the new Comgr MT approach. I haven't had a chance to fully investigate these yet.

Could you retest MIOpen with a newer Comgr including the above patches?

atamazov · 2023-03-03T17:51:44Z

@lamb-j

Currently we're enforcing thread safety by enclosing the LLVM calls that have unexpected side effects with mutex locks... However there have been a couple of reports of potential bugs with the new Comgr MT approach.

Just in case, have you taken into account that compilation is often a series of COMgr actions? For example, if you build two HIP sources in parallel, then there are two series of actions:

AMD_COMGR_ACTION_COMPILE_SOURCE_TO_BC
AMD_COMGR_ACTION_ADD_DEVICE_LIBRARIES
AMD_COMGR_ACTION_LINK_BC_TO_BC
AMD_COMGR_ACTION_CODEGEN_BC_TO_RELOCATABLE
AMD_COMGR_ACTION_LINK_RELOCATABLE_TO_EXECUTABLE

Without proper serialization these two series or actions will likely run in parallel, which may lead to unexpected side effects.

In MIOpen we can use mutex at higher level (and we did that), which guarantees proper serialization of COMgr calls.

lamb-j · 2023-03-27T18:23:46Z

@atamazov Good question. If I understand correctly, you're asking about a possible situation like this:

Thread 1 --> AMD_COMGR_ACTION_COMPILE_SOURCE_TO_BC
Thread 2 --> AMD_COMGR_ACTION_COMPILE_SOURCE_TO_BC
Thread 1 --> AMD_COMGR_ACTION_ADD_DEVICE_LIBRARIES
Thread 2 --> AMD_COMGR_ACTION_ADD_DEVICE_LIBRARIES

Will the interleaving of the two thread's actions have unexpected side effects?

Short answer: They shouldn't.

Long answer: The locks are added at the action level. Comgr doesn't maintain any sort of global state for completed actions, and the LLVM global state is cleared for each action (see COMGR::clearLLVMOptions()). Therefore each action should be completely independent, and it shouldn't make any difference how they're interleaved.

atamazov · 2023-03-27T19:15:39Z

@lamb-j Good to know that, Jacob. Thanks.

lamb-j · 2023-03-27T20:08:47Z

@atamazov Closing this for now. Feel free to re-open if you have any other multi-threading issues with Comgr! Hopefully we can move to actual parallel compilation with Comgr in the future (that is, remove the Comgr locks around actions without causing unexpected side effects).

atamazov · 2023-03-27T20:26:26Z

Basically, this is about actual parallel compilation, because we need it for performance. I recommend re-opening this ticket (I can't).

lamb-j · 2023-03-27T21:15:18Z

Good point. We can track the performance/parallel issue here:

https://github.com/RadeonOpenCompute/ROCm-CompilerSupport/issues/57

atamazov added bug Something isn't working priority_blocker labels May 1, 2020

atamazov assigned scott-linder May 1, 2020

atamazov mentioned this issue May 1, 2020

[comgr] Attach comgr and use it to build OpenCL kernels for HIP BE ROCm/MIOpen#177

Merged

atamazov assigned daniellowell and unassigned daniellowell May 13, 2020

atamazov added priority_high and removed priority_blocker labels May 22, 2020

atamazov added enhancement New feature or request and removed bug Something isn't working labels Apr 30, 2021

atamazov mentioned this issue Mar 9, 2023

Allow parallel compilation with COMgr and HIPRTC ROCm/MIOpen#2021

Open

lamb-j assigned lamb-j and unassigned scott-linder Mar 27, 2023

lamb-j closed this as completed Mar 27, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

comgr lacks MT safety #27

comgr lacks MT safety #27

atamazov commented May 1, 2020 •

edited

Loading

scott-linder commented May 20, 2020

atamazov commented Mar 3, 2023 •

edited

Loading

kzhuravl commented Mar 3, 2023

atamazov commented Mar 3, 2023

lamb-j commented Mar 3, 2023

atamazov commented Mar 3, 2023 •

edited

Loading

lamb-j commented Mar 27, 2023

atamazov commented Mar 27, 2023

lamb-j commented Mar 27, 2023

atamazov commented Mar 27, 2023

lamb-j commented Mar 27, 2023

comgr lacks MT safety #27

comgr lacks MT safety #27

Comments

atamazov commented May 1, 2020 • edited Loading

How to reproduce

🔴 There are two problems:

scott-linder commented May 20, 2020

atamazov commented Mar 3, 2023 • edited Loading

kzhuravl commented Mar 3, 2023

atamazov commented Mar 3, 2023

lamb-j commented Mar 3, 2023

atamazov commented Mar 3, 2023 • edited Loading

lamb-j commented Mar 27, 2023

atamazov commented Mar 27, 2023

lamb-j commented Mar 27, 2023

atamazov commented Mar 27, 2023

lamb-j commented Mar 27, 2023

atamazov commented May 1, 2020 •

edited

Loading

atamazov commented Mar 3, 2023 •

edited

Loading

atamazov commented Mar 3, 2023 •

edited

Loading