-
Notifications
You must be signed in to change notification settings - Fork 99
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Issue]: Can not set fan speed on Radeon Pro W7900 #162
Comments
Note that for your 2nd message that's partition mode, not temperature. You aren't using a GPU that supports memory or compute partitions. As for fan speed, what happens if you try to manually change the temperature? Does dmesg throw any errors? Does it seem to work (IE no errors) but the fan doesn't change? Does it work if performance mode is set to auto and not manual? And if you set it to 100 and read it back, does it still return 50% as its value? |
Thanks for your assistant. When I change the fan speed , the rocm-smi will output set fan speed succeed, but the hardware fan still work default. The dmesg doesn't throw any errors. The card still work but fan doesn't change,I always use performance mode and it not auto. I set it to 100 and read it back the fan speed still keep auto the highest speed just still 50% left and right. |
One more serious problem. I write the "rocm-smi --setsclk 2" command in boot service ,because it could limit the GPU clock. If I doens't use the command that means I set default. Some time the GPU clock will up to 150% (3ghz) and It will cause the computer force poeroff reboot.Please fix it! sincere gratitude! |
OK so I see it saying that it set it to 100% there. Does it say it's at 100% after you run "rocm-smi" after but it's running slow, or does it just stay at the lower speed while reporting that lower speed? Does dmesg say anything after you've done that command? |
Yes, it just stay at the lower speed while reporting that lower speed. |
dmesg say : "[17629.054891] amdgpu: manual fan speed control should be enabled first" |
How can I change the "/sys/class/drm/card0/device/drm/card0/device/hwmon/hwmon2/pwm1_enable" value? I think that is the problem is. |
Can you echo "1" to pwm1_enable first? If that works, then it looks like the SMI tool has a bug where it's not setting "manual? to the pwm1_enable file first before trying to change the value. if we do that, then we should be good to set it to the value that you desire. |
Sorry I used try it , but I can't change the value in /sys file system.(I use the root user identity to change the value) I think it was generate by driver or changed by rocm-smi. Thanks for you reply! |
I use the LACT tool but it also have BUG like our meet (ilya-zlobintsev/LACT#255) . That look like we meet same problem of this, we need some one fix it. Thanks~ |
The file is created by amdgpu: $ cat /sys/class/drm/card0/device/hwmon/hwmon2/pwm1_enable If it fails, check if dmesg says why or if it returns a value like -22 (which means the driver thinks that fan control isn't supported on the device) |
Does dmesg say anything as to why? Ideally there would be a message there to say what's happening. Grabbing and attaching the full dmesg, from boot to the failed attempt to change the fans, would help. Maybe something showed up during device init, or after you tried to set the fans, to give us a clue as to what's up. |
Here is the dmesg about amdgpu when I was reboot just now. alic-li@alic-li-B660M-D2H-DDR4:~$ sudo dmesg | grep "amdgpu" |
So I managed to find a NV31 internally, and can reproduce the same as you have there. I enabled some additional logging and found that the SMU isn't reporting WHY it can't do it, just that it isn't doing it. @ppanchad-amd Can we make an internal JIRA for this and assign it to the SMU team for Navi31? Thanks! |
@Alic-Li @kentrussell Internal ticket has been created to investigate this issue. Thanks! |
Sure ! thanks for you help, I'll wait for you good news.Waiting for the updating😉 |
Problem Description
Can not set fan speed on Radeon Pro W7900 , and it also can not set fan speed on GFX1100 such as RX-7900xtx, but GFX1030 could set successfully .The GPU temp will up to 80~90 ,the memmory temp will up to 100 ,and junction will up to 100,but it just have 50% fan speed. >_<
Operating System
Ubuntu 22.04.3 (jemmy jellyfish)
CPU
Intel i3-12100 with UHD730 Graphics
GPU
AMD Radeon Pro W7900
ROCm Version
ROCm 6.1.0
ROCm Component
amdsmi
Steps to Reproduce
No response
(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support
ROCk module is loaded
HSA System Attributes
Runtime Version: 1.13
Runtime Ext Version: 1.4
System Timestamp Freq.: 1000.000000MHz
Sig. Max Wait Duration: 18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count)
Machine Model: LARGE
System Endianness: LITTLE
Mwaitx: DISABLED
DMAbuf Support: YES
==========
HSA Agents
Agent 1
Name: 12th Gen Intel(R) Core(TM) i3-12100
Uuid: CPU-XX
Marketing Name: 12th Gen Intel(R) Core(TM) i3-12100
Vendor Name: CPU
Feature: None specified
Profile: FULL_PROFILE
Float Round Mode: NEAR
Max Queue Number: 0(0x0)
Queue Min Size: 0(0x0)
Queue Max Size: 0(0x0)
Queue Type: MULTI
Node: 0
Device Type: CPU
Cache Info:
L1: 49152(0xc000) KB
Chip ID: 0(0x0)
ASIC Revision: 0(0x0)
Cacheline Size: 64(0x40)
Max Clock Freq. (MHz): 4300
BDFID: 0
Internal Node ID: 0
Compute Unit: 8
SIMDs per CU: 0
Shader Engines: 0
Shader Arrs. per Eng.: 0
WatchPts on Addr. Ranges:1
Features: None
Pool Info:
Pool 1
Segment: GLOBAL; FLAGS: FINE GRAINED
Size: 65587452(0x3e8c8fc) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Recommended Granule:4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
Pool 2
Segment: GLOBAL; FLAGS: KERNARG, FINE GRAINED
Size: 65587452(0x3e8c8fc) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Recommended Granule:4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
Pool 3
Segment: GLOBAL; FLAGS: COARSE GRAINED
Size: 65587452(0x3e8c8fc) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Recommended Granule:4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
ISA Info:
Agent 2
Name: gfx1100
Uuid: GPU-ed466fc6e51f9536
Marketing Name: AMD Radeon PRO W7900
Vendor Name: AMD
Feature: KERNEL_DISPATCH
Profile: BASE_PROFILE
Float Round Mode: NEAR
Max Queue Number: 128(0x80)
Queue Min Size: 64(0x40)
Queue Max Size: 131072(0x20000)
Queue Type: MULTI
Node: 1
Device Type: GPU
Cache Info:
L1: 32(0x20) KB
L2: 6144(0x1800) KB
L3: 98304(0x18000) KB
Chip ID: 29768(0x7448)
ASIC Revision: 0(0x0)
Cacheline Size: 64(0x40)
Max Clock Freq. (MHz): 1760
BDFID: 768
Internal Node ID: 1
Compute Unit: 96
SIMDs per CU: 2
Shader Engines: 6
Shader Arrs. per Eng.: 2
WatchPts on Addr. Ranges:4
Coherent Host Access: FALSE
Features: KERNEL_DISPATCH
Fast F16 Operation: TRUE
Wavefront Size: 32(0x20)
Workgroup Max Size: 1024(0x400)
Workgroup Max Size per Dimension:
x 1024(0x400)
y 1024(0x400)
z 1024(0x400)
Max Waves Per CU: 32(0x20)
Max Work-item Per CU: 1024(0x400)
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension:
x 4294967295(0xffffffff)
y 4294967295(0xffffffff)
z 4294967295(0xffffffff)
Max fbarriers/Workgrp: 32
Packet Processor uCode:: 92
SDMA engine uCode:: 20
IOMMU Support:: None
Pool Info:
Pool 1
Segment: GLOBAL; FLAGS: COARSE GRAINED
Size: 47169536(0x2cfc000) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Recommended Granule:2048KB
Alloc Alignment: 4KB
Accessible by all: FALSE
Pool 2
Segment: GLOBAL; FLAGS: EXTENDED FINE GRAINED
Size: 47169536(0x2cfc000) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Recommended Granule:2048KB
Alloc Alignment: 4KB
Accessible by all: FALSE
Pool 3
Segment: GROUP
Size: 64(0x40) KB
Allocatable: FALSE
Alloc Granule: 0KB
Alloc Recommended Granule:0KB
Alloc Alignment: 0KB
Accessible by all: FALSE
ISA Info:
ISA 1
Name: amdgcn-amd-amdhsa--gfx1100
Machine Models: HSA_MACHINE_MODEL_LARGE
Profiles: HSA_PROFILE_BASE
Default Rounding Mode: NEAR
Default Rounding Mode: NEAR
Fast f16: TRUE
Workgroup Max Size: 1024(0x400)
Workgroup Max Size per Dimension:
x 1024(0x400)
y 1024(0x400)
z 1024(0x400)
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension:
x 4294967295(0xffffffff)
y 4294967295(0xffffffff)
z 4294967295(0xffffffff)
FBarrier Max Size: 32
*** Done ***
Additional Information
No response
The text was updated successfully, but these errors were encountered: