Skip to content
This repository has been archived by the owner on May 14, 2024. It is now read-only.

comgr lacks MT safety #27

Closed
atamazov opened this issue May 1, 2020 · 11 comments
Closed

comgr lacks MT safety #27

atamazov opened this issue May 1, 2020 · 11 comments
Assignees
Labels
enhancement New feature or request priority_high

Comments

@atamazov
Copy link

atamazov commented May 1, 2020

It seems like comgr is not working properly in multithreaded contexts. This prevents it from full-blown usage in MIOpen.

How to reproduce

Prerequisites: A linux machine with gfx900/906/908 GPU card installed (Radeon VII preferred), vanilla ROCm 3.3.

  • (1) Pull the https://github.com/ROCmSoftwarePlatform/MIOpen/tree/comgr-attach branch from MIOpen repo and checkout ROCm/MIOpen@9a3a2e7
  • (2) Build MIOpen library and MIOpenDriver with HIP backend. Use developer build, cmake -DBUILD_DEV=On... (see here for detailed instruction), then make MIOpenDriver -j. Do not install.
  • (3) Optional: modify environment to skip non-relevant kernels in MIOpen. This would speed-up execution and clean MIOpen logs from non-relevant information.
export MIOPEN_DEBUG_CONV_GEMM=0
export MIOPEN_DEBUG_CONV_FFT=0
export MIOPEN_DEBUG_CONV_IMPLICIT_GEMM=0
export MIOPEN_DEBUG_CONV_SCGEMM=0
export MIOPEN_DEBUG_CONV_WINOGRAD=0
export MIOPEN_DEBUG_CONV_DIRECT=1
export MIOPEN_DEBUG_GCN_ASM_KERNELS=0
export MIOPEN_DEBUG_OPENCL_CONVOLUTIONS=1
  • (4) Run the following command and make sure that it works fine:
$ ./bin/MIOpenDriver conv -x 20 -y 5 -W 700 -H 161 -c 1 -n 8 -k 32 -p 0 -q 0 -u 2 -v 2 -l 1 -j 1 -g 1 -F 1 -V 0 -s 0 -t 1 -i 1
MIOpenDriver conv -x 20 -y 5 -W 700 -H 161 -c 1 -n 8 -k 32 -p 0 -q 0 -u 2 -v 2 -l 1 -j 1 -g 1 -F 1 -V 0 -s 0 -t 1 -i 1
MIOpen Forward Conv. Algorithm: 1, Solution: 9/ConvOclDirectFwdGen
GPU Kernel Time Forward Conv. Elapsed: 0.307048 ms (average)
stats: name, n, c, ho, wo, x, y, k, flopCnt, bytesRead, bytesWritten, GFLOPs, GB/s, timeMs
stats: fwd-conv5x20u2, 8, 1, 79, 341, 5, 20, 32,  1379276800, 3619200, 27585536, 4492, 102, 0.307048
  • (5) Optional: re-run the same command prefixed with MIOPEN_LOG_LEVEL=5 and make sure that the following lines present on the console:
Info [EvaluateInvokers] ConvOclDirectFwdGen: MIOpenCDFGen4: 0.323367 < 3.40282e+38
Info [EvaluateInvokers] ConvOclDirectFwd: MIOpenConvUni: 1.67044 >= 0.323367
...
Info [FindConvFwdAlgorithm] FW Chosen Algorithm: ConvOclDirectFwdGen , 0, 0.323367

These upper two lines tell us that two kernels (MIOpenCDFGen4 and MIOpenConvUni) obtained from two Solutions (ConvOclDirectFwdGen and ConvOclDirectFwd) were built and run by the library. The third line says that ConvOclDirectFwdGen (the fastest) was selected. The library builds the kernels in parallel; you may wish to export MIOPEN_ENABLE_LOGGING_MPMT=1 and MIOPEN_LOG_LEVEL=6 and see these details in the console log.

  • (6) Now it's time to use comgr. Clean build directory (you can skip cleaning the build directory, but I am not sure if this would rebuild all the necessary components). Add -DMIOPEN_USE_COMGR=On to cmake command line (see here for details) and rebuild library and driver,
  • (7) Re-run the command from step (4). Output:
$ ./bin/MIOpenDriver conv -x 20 -y 5 -W 700 -H 161 -c 1 -n 8 -k 32 -p 0 -q 0 -u 2 -v 2 -l 1 -j 1 -g 1 -F 1 -V 0 -s 0 -t 1 -i 1
MIOpenDriver conv -x 20 -y 5 -W 700 -H 161 -c 1 -n 8 -k 32 -p 0 -q 0 -u 2 -v 2 -l 1 -j 1 -g 1 -F 1 -V 0 -s 0 -t 1 -i 1
clang (LLVM option parsing): for the --amdgpu-early-inline-all option: may only occur zero or one times!
clang (LLVM option parsing): for the --amdgpu-prelink option: may only occur zero or one times!
clang (LLVM option parsing): for the --amdgpu-internalize-symbols option: may only occur zero or one times!
clang (LLVM option parsing): for the --pgo-warn-misexpect option: may only occur zero or one times!

🔴 There are two problems:

  • Build does not succeed.
  • Build kills the driver. Neither error codes returned to MIOpen, nor exceptions thrown.

You can disable MT builds in MIOpen by the following env setting:

export MIOPEN_COMPILE_PARALLEL_LEVEL=1

and see that driver works normally in this case.

@scott-linder
Copy link

As an update on this, the first step will be to define the MT safety of all non-trivially-MT-safe APIs in Comgr to be MT-unsafe. Then future work to give APIs the most meaningful/useful safety guarantees can be backwards-compatible.

For now, assume all Comgr APIs are de-facto MT-unsafe.

@atamazov atamazov added enhancement New feature or request and removed bug Something isn't working labels Apr 30, 2021
@atamazov
Copy link
Author

atamazov commented Mar 3, 2023

@scott-linder I heard that there are some updates wrt this ticket, Can you please shed some light on this?

@kzhuravl
Copy link
Collaborator

kzhuravl commented Mar 3, 2023

CC @lamb-j

@atamazov
Copy link
Author

atamazov commented Mar 3, 2023

CC @averinevg

@lamb-j
Copy link
Collaborator

lamb-j commented Mar 3, 2023

We recently added two patches to Comgr related to this:

Currently we're enforcing thread safety by enclosing the LLVM calls that have unexpected side effects with mutex locks. As more updates are made to upstream LLVM related to these unexpected side effects, we hope to remove these locks to allow more parallelism in a multi-threaded Comgr context.

However there have been a couple of reports of potential bugs with the new Comgr MT approach. I haven't had a chance to fully investigate these yet.

Could you retest MIOpen with a newer Comgr including the above patches?

@atamazov
Copy link
Author

atamazov commented Mar 3, 2023

@lamb-j

Currently we're enforcing thread safety by enclosing the LLVM calls that have unexpected side effects with mutex locks... However there have been a couple of reports of potential bugs with the new Comgr MT approach.

Just in case, have you taken into account that compilation is often a series of COMgr actions? For example, if you build two HIP sources in parallel, then there are two series of actions:

  • AMD_COMGR_ACTION_COMPILE_SOURCE_TO_BC
  • AMD_COMGR_ACTION_ADD_DEVICE_LIBRARIES
  • AMD_COMGR_ACTION_LINK_BC_TO_BC
  • AMD_COMGR_ACTION_CODEGEN_BC_TO_RELOCATABLE
  • AMD_COMGR_ACTION_LINK_RELOCATABLE_TO_EXECUTABLE

Without proper serialization these two series or actions will likely run in parallel, which may lead to unexpected side effects.

In MIOpen we can use mutex at higher level (and we did that), which guarantees proper serialization of COMgr calls.

@lamb-j
Copy link
Collaborator

lamb-j commented Mar 27, 2023

@atamazov Good question. If I understand correctly, you're asking about a possible situation like this:

Thread 1 --> AMD_COMGR_ACTION_COMPILE_SOURCE_TO_BC
Thread 2 --> AMD_COMGR_ACTION_COMPILE_SOURCE_TO_BC
Thread 1 --> AMD_COMGR_ACTION_ADD_DEVICE_LIBRARIES
Thread 2 --> AMD_COMGR_ACTION_ADD_DEVICE_LIBRARIES

Will the interleaving of the two thread's actions have unexpected side effects?

Short answer: They shouldn't.

Long answer: The locks are added at the action level. Comgr doesn't maintain any sort of global state for completed actions, and the LLVM global state is cleared for each action (see COMGR::clearLLVMOptions()). Therefore each action should be completely independent, and it shouldn't make any difference how they're interleaved.

@atamazov
Copy link
Author

@lamb-j Good to know that, Jacob. Thanks.

@lamb-j
Copy link
Collaborator

lamb-j commented Mar 27, 2023

@atamazov Closing this for now. Feel free to re-open if you have any other multi-threading issues with Comgr! Hopefully we can move to actual parallel compilation with Comgr in the future (that is, remove the Comgr locks around actions without causing unexpected side effects).

@lamb-j lamb-j closed this as completed Mar 27, 2023
@atamazov
Copy link
Author

Basically, this is about actual parallel compilation, because we need it for performance. I recommend re-opening this ticket (I can't).

@lamb-j
Copy link
Collaborator

lamb-j commented Mar 27, 2023

Good point. We can track the performance/parallel issue here:

https://github.com/RadeonOpenCompute/ROCm-CompilerSupport/issues/57

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request priority_high
Projects
None yet
Development

No branches or pull requests

5 participants