Segmentation Fault on loading SD #1

kevinhalgren · 2024-03-05T02:26:42Z

I guess I need to confirm the use-case for this. I have an HP AMD-based Elitebook 705 G4 mini with a Ryzen 5 2400GE CPU. The BIOS doesn't allow me to set the iGPU dedicated RAM greater than 1GB. I was hoping to use this as a means to get around the BIOS limitation and have pytorch actually use the GTT shared RAM as VRAM. If it is not intended to allow pytorch/SD to do that, then the segfault is probably due to having only 1GB of VRAM available to ROCm and I'm out of luck trying to do anything other than CPU-only mode on this system.

I'm running Ubuntu 22.04, ROCm 5.6.1, Python 3.10.12. I get a segfault when trying to start SD:

./webui.sh: line 256: 49971 Segmentation fault (core dumped) "${python_cmd}" -u "${LAUNCH_SCRIPT}" "$@"

ROCm SMI output is below:

========================= ROCm System Management Interface =========================
=================================== Concise Info ===================================
GPU[0] : get_power_avg, Not supported on the given system
ERROR: GPU[0] : sclk clock is unsupported

GPU[0] : get_power_cap, Not supported on the given system
GPU Temp (DieEdge) AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU%
0 50.0c N/A None 933Mhz 0% auto Unsupported 7% 100%

=============================== End of ROCm SMI Log ================================

radeontop shows GTT memory is allocated as intended:

                                       71M / 1000M VRAM   7.14% x
                                       27M / 15498M GTT   0.17% x

I've tried various startup options, such as the one shown below, but continue to get the segfault error whatever options I choose.

LD_PRELOAD=/libforcegttalloc.so HSA_OVERRIDE_GFX_VERSION=10.3.0 ./webui.sh --listen --lowvram --opt-sub-quad-attention --precision full --no-half

Would appreciate a confirmation whether or not this should work. Thanks.

The text was updated successfully, but these errors were encountered:

qkiel · 2024-03-21T21:00:47Z

I have a 5600G APU and Stable Diffusion works for me with force-host-alloction-APU and only 512MiB allocated to VRAM.

First, for Ryzen 5 2400GE you should probably use HSA_OVERRIDE_GFX_VERSION=9.0.0 like me. For Stable Diffusion I'm using Fooocus, which I start it like this:

LD_PRELOAD=~/force-host-alloction-APU/./libforcegttalloc.so python3 ~/Fooocus/entry_with_update.py --always-high-vram

Notice a ./ before libforcegttalloc.so. This works on ROCm version 5.7.3, but with one additional environment variable HSA_ENABLE_SDMA=0 I can use even the newest ROCm 6.0.

Links to ROCm and PyTorch versions I used:
https://www.amd.com/en/support/linux-drivers
https://pytorch.org/get-started/locally/

ROCm 5.7 with PyTorch for 5.7

https://repo.radeon.com/amdgpu-install/5.7.3/ubuntu/jammy/amdgpu-install_5.7.50703-1_all.deb
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm5.7

ROCm 6.0 with PyTorch for 6.0

https://repo.radeon.com/amdgpu-install/23.40.2/ubuntu/jammy/amdgpu-install_6.0.60002-1_all.deb
pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/rocm6.0

segurac · 2024-03-25T15:22:42Z

Thanks a lot @qkiel for the info!

I will update the project's README with your instructions.

winstonma · 2024-03-29T00:21:32Z

@qkiel I think it may be worth taking notes

1. Find HSA_OVERRIDE_GFX_VERSION value

You can use the following command to check, e.g. AMD Ryzen 6800U

$ rocminfo | grep gfx
  Name:                    gfx1030                            
      Name:                    amdgcn-amd-amdhsa--gfx1030

Then my value would be HSA_OVERRIDE_GFX_VERSION=10.3.0

2. Store all the value in `.bashrc`

For lazy ass like me you can do the following

# Compile the library
$ git clone https://github.com/segurac/force-host-alloction-APU.git
$ CUDA_PATH=/usr/ HIP_PLATFORM="amd" hipcc force-host-alloction-APU/forcegttalloc.c -o force-host-alloction-APU/libforcegttalloc.so  -shared -fPIC
$ sudo mv force-host-alloction-APU/libforcegttalloc.so /usr/local/lib
$ rm -rf force-host-alloction-APU

# Put the environment variable in .bashrc, please don't forget to check value on step 1
$ echo 'export HSA_OVERRIDE_GFX_VERSION=10.3.0' >> ~/.bashrc 
$ echo 'export LD_PRELOAD=/usr/local/lib/libforcegttalloc.so' >> ~/.bashrc

3. Notice on AMD ROCm upgrade

AMD recently release ROCm, and you have to recompile this library accordingly

# Install the driver
$ wget https://repo.radeon.com/amdgpu-install/6.0.3/ubuntu/jammy/amdgpu-install_6.0.60003-1_all.deb
$ sudo dpkg -i amdgpu-install_6.0.60003-1_all.deb
$ sudo apt update
$ sudo apt upgrade
# Rebuild dkms module, this step really depends on the driver version as well as your kernel version
$ sudo dkms build -m amdgpu -v 6.3.6-1739731.22.04 --kernelsourcedir kernel-6.5.0-26-generic-x86_64

# Re-compile the library
$ git clone https://github.com/segurac/force-host-alloction-APU.git
$ CUDA_PATH=/usr/ HIP_PLATFORM="amd" hipcc force-host-alloction-APU/forcegttalloc.c -o force-host-alloction-APU/libforcegttalloc.so  -shared -fPIC
$ sudo mv force-host-alloction-APU/libforcegttalloc.so /usr/local/lib
$ rm -rf force-host-alloction-APU

4. Make your system run faster

I use RyzenAdj to boost the PyTorch speed (not much, around 10%). After download and compile the tool. You can just adjust the vrmmax-current value on your system. And the value is referenced from settings in Universal-x86-Tuning-Utility.

# Read the current value
$ sudo ./ryzenadj -i | grep vrmmax-current
| EDC LIMIT VDD       |   90.000 | vrmmax-current 

# Modify the vrmmax-current value
$ sudo ./ryzenadj -k 105000

# Read the current value again
$ sudo ./ryzenadj -i | grep vrmmax-current
| EDC LIMIT VDD       |   105.000 | vrmmax-current

Although I try to set some larger value but when I read it back it gives me back 105.000. So I guess there is some limit set by the system. After the value is set then run the PyTorch tool. I find this way I would get 10% boost. See if that works for you.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Segmentation Fault on loading SD #1

Segmentation Fault on loading SD #1

kevinhalgren commented Mar 5, 2024

qkiel commented Mar 21, 2024 •

edited

Loading

segurac commented Mar 25, 2024

winstonma commented Mar 29, 2024 •

edited

Loading

Segmentation Fault on loading SD #1

Segmentation Fault on loading SD #1

Comments

kevinhalgren commented Mar 5, 2024

========================= ROCm System Management Interface ========================= =================================== Concise Info =================================== GPU[0] : get_power_avg, Not supported on the given system ERROR: GPU[0] : sclk clock is unsupported

GPU[0] : get_power_cap, Not supported on the given system GPU Temp (DieEdge) AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% 0 50.0c N/A None 933Mhz 0% auto Unsupported 7% 100%

qkiel commented Mar 21, 2024 • edited Loading

segurac commented Mar 25, 2024

winstonma commented Mar 29, 2024 • edited Loading

1. Find HSA_OVERRIDE_GFX_VERSION value

2. Store all the value in .bashrc

3. Notice on AMD ROCm upgrade

4. Make your system run faster

========================= ROCm System Management Interface =========================
=================================== Concise Info ===================================
GPU[0] : get_power_avg, Not supported on the given system
ERROR: GPU[0] : sclk clock is unsupported

GPU[0] : get_power_cap, Not supported on the given system
GPU Temp (DieEdge) AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU%
0 50.0c N/A None 933Mhz 0% auto Unsupported 7% 100%

qkiel commented Mar 21, 2024 •

edited

Loading

winstonma commented Mar 29, 2024 •

edited

Loading

2. Store all the value in `.bashrc`