Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation Fault on loading SD #1

Open
kevinhalgren opened this issue Mar 5, 2024 · 3 comments
Open

Segmentation Fault on loading SD #1

kevinhalgren opened this issue Mar 5, 2024 · 3 comments

Comments

@kevinhalgren
Copy link

I guess I need to confirm the use-case for this. I have an HP AMD-based Elitebook 705 G4 mini with a Ryzen 5 2400GE CPU. The BIOS doesn't allow me to set the iGPU dedicated RAM greater than 1GB. I was hoping to use this as a means to get around the BIOS limitation and have pytorch actually use the GTT shared RAM as VRAM. If it is not intended to allow pytorch/SD to do that, then the segfault is probably due to having only 1GB of VRAM available to ROCm and I'm out of luck trying to do anything other than CPU-only mode on this system.

I'm running Ubuntu 22.04, ROCm 5.6.1, Python 3.10.12. I get a segfault when trying to start SD:

./webui.sh: line 256: 49971 Segmentation fault (core dumped) "${python_cmd}" -u "${LAUNCH_SCRIPT}" "$@"

ROCm SMI output is below:

========================= ROCm System Management Interface =========================
=================================== Concise Info ===================================
GPU[0] : get_power_avg, Not supported on the given system
ERROR: GPU[0] : sclk clock is unsupported

GPU[0] : get_power_cap, Not supported on the given system
GPU Temp (DieEdge) AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU%
0 50.0c N/A None 933Mhz 0% auto Unsupported 7% 100%

=============================== End of ROCm SMI Log ================================

radeontop shows GTT memory is allocated as intended:

                                       71M / 1000M VRAM   7.14% x
                                       27M / 15498M GTT   0.17% x

I've tried various startup options, such as the one shown below, but continue to get the segfault error whatever options I choose.

LD_PRELOAD=/libforcegttalloc.so HSA_OVERRIDE_GFX_VERSION=10.3.0 ./webui.sh --listen --lowvram --opt-sub-quad-attention --precision full --no-half

Would appreciate a confirmation whether or not this should work. Thanks.

@qkiel
Copy link

qkiel commented Mar 21, 2024

I have a 5600G APU and Stable Diffusion works for me with force-host-alloction-APU and only 512MiB allocated to VRAM.

First, for Ryzen 5 2400GE you should probably use HSA_OVERRIDE_GFX_VERSION=9.0.0 like me. For Stable Diffusion I'm using Fooocus, which I start it like this:

LD_PRELOAD=~/force-host-alloction-APU/./libforcegttalloc.so python3 ~/Fooocus/entry_with_update.py --always-high-vram

Notice a ./ before libforcegttalloc.so. This works on ROCm version 5.7.3, but with one additional environment variable HSA_ENABLE_SDMA=0 I can use even the newest ROCm 6.0.

Links to ROCm and PyTorch versions I used:
https://www.amd.com/en/support/linux-drivers
https://pytorch.org/get-started/locally/

ROCm 5.7 with PyTorch for 5.7

https://repo.radeon.com/amdgpu-install/5.7.3/ubuntu/jammy/amdgpu-install_5.7.50703-1_all.deb
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm5.7

ROCm 6.0 with PyTorch for 6.0

https://repo.radeon.com/amdgpu-install/23.40.2/ubuntu/jammy/amdgpu-install_6.0.60002-1_all.deb
pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/rocm6.0

@segurac
Copy link
Owner

segurac commented Mar 25, 2024

Thanks a lot @qkiel for the info!

I will update the project's README with your instructions.

@winstonma
Copy link

winstonma commented Mar 29, 2024

@qkiel I think it may be worth taking notes

1. Find HSA_OVERRIDE_GFX_VERSION value

You can use the following command to check, e.g. AMD Ryzen 6800U

$ rocminfo | grep gfx
  Name:                    gfx1030                            
      Name:                    amdgcn-amd-amdhsa--gfx1030    

Then my value would be HSA_OVERRIDE_GFX_VERSION=10.3.0

2. Store all the value in .bashrc

For lazy ass like me you can do the following

# Compile the library
$ git clone https://github.com/segurac/force-host-alloction-APU.git
$ CUDA_PATH=/usr/ HIP_PLATFORM="amd" hipcc force-host-alloction-APU/forcegttalloc.c -o force-host-alloction-APU/libforcegttalloc.so  -shared -fPIC
$ sudo mv force-host-alloction-APU/libforcegttalloc.so /usr/local/lib
$ rm -rf force-host-alloction-APU

# Put the environment variable in .bashrc, please don't forget to check value on step 1
$ echo 'export HSA_OVERRIDE_GFX_VERSION=10.3.0' >> ~/.bashrc 
$ echo 'export LD_PRELOAD=/usr/local/lib/libforcegttalloc.so' >> ~/.bashrc 

3. Notice on AMD ROCm upgrade

AMD recently release ROCm, and you have to recompile this library accordingly

# Install the driver
$ wget https://repo.radeon.com/amdgpu-install/6.0.3/ubuntu/jammy/amdgpu-install_6.0.60003-1_all.deb
$ sudo dpkg -i amdgpu-install_6.0.60003-1_all.deb
$ sudo apt update
$ sudo apt upgrade
# Rebuild dkms module, this step really depends on the driver version as well as your kernel version
$ sudo dkms build -m amdgpu -v 6.3.6-1739731.22.04 --kernelsourcedir kernel-6.5.0-26-generic-x86_64

# Re-compile the library
$ git clone https://github.com/segurac/force-host-alloction-APU.git
$ CUDA_PATH=/usr/ HIP_PLATFORM="amd" hipcc force-host-alloction-APU/forcegttalloc.c -o force-host-alloction-APU/libforcegttalloc.so  -shared -fPIC
$ sudo mv force-host-alloction-APU/libforcegttalloc.so /usr/local/lib
$ rm -rf force-host-alloction-APU

4. Make your system run faster

I use RyzenAdj to boost the PyTorch speed (not much, around 10%). After download and compile the tool. You can just adjust the vrmmax-current value on your system. And the value is referenced from settings in Universal-x86-Tuning-Utility.

# Read the current value
$ sudo ./ryzenadj -i | grep vrmmax-current
| EDC LIMIT VDD       |   90.000 | vrmmax-current 

# Modify the vrmmax-current value
$ sudo ./ryzenadj -k 105000

# Read the current value again
$ sudo ./ryzenadj -i | grep vrmmax-current
| EDC LIMIT VDD       |   105.000 | vrmmax-current 

Although I try to set some larger value but when I read it back it gives me back 105.000. So I guess there is some limit set by the system. After the value is set then run the PyTorch tool. I find this way I would get 10% boost. See if that works for you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants