This repository has been archived by the owner on Jan 26, 2024. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 55
clinfo hangs on configurations with two AMD GPU and open source rocm #148
Comments
Does the same happen with /opt/rocm/bin/clinfo? |
Excuse me, but why clinfo should placed in /opt/rocm/bin ?
|
One reason is that AMD wrote its own clinfo back in the days of OpenCL 1.0, long before any other implementations appeared on github and were picked up by the distros, and has maintained it since. |
"DaVinci Resolve" has same symptoms (looks like infinite loop which eat 100% CPU) |
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
clinfo hangs in a cycle since it completely occupies one processor core. Same symptoms I observed when launch "DaVinci Resolve". On a desktop with a single Radeon 6900XT GPU, this problem does not occurs.
My configuration:
One GPU is internal in the RENOIR processor, and the other is a discrete AMD Radeon 6800M (It laptop ASUS G513QY)
In the BIOS there is no ability to turn off the integrated GPU in the processor, so there is no way to check this configuration with each GPU separately.
In the kernel log there is no error so it is most likely a user space issue, but I am not sure about it.
But when I forcibly terminate clinfo (press <Ctrl + C> until in the terminal returned typing) in the kernel log appears follow messages:
[ 1962.000909] amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[ 1962.000912] amdgpu: Failed to evict process queues
[ 1962.000918] amdgpu: Failed to quiesce KFD
[ 1966.010395] amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[ 1966.010406] amdgpu: Resetting wave fronts (cpsch) on dev 00000000b40e7982
I am used open source rocm stack from package rocm-opencl [1] which passed review and already pushed to official Fedora repository [2].
Output clinfo ended with line:
Max work group size (AMD) 1024
Full clinfo output you can find here [3]
Backtrace clinfo you can find here [4]
The clinfo developer says that the problem is deeper in rocm or kernel [5].
Versions:
[1] https://copr.fedorainfracloud.org/coprs/mystro256/rocm-opencl/
[2] https://bugzilla.redhat.com/show_bug.cgi?id=2090823
[3] https://pastebin.com/TR5zy30Z
[4] https://pastebin.com/wv5iGibi
[5] Oblomov/clinfo#81
The text was updated successfully, but these errors were encountered: