Text-generation-webui manual installation on Windows WSL2 / Ubuntu

Important:

For a simple automatic install, use the one-click installers provided in the original repo.
This tech is absolutely bleeding edge, methods and tools change on a daily basis, consider this page as outdates as soon as it's updated, things break - regularily
Look for more recent tutorials on youtube and reddit

Advanced WSL2 Ubuntu install 2023-05-15

reddit comments but will also eventually be outdated again

# 1 install WSL2 on Windows 11, then:
sudo apt update
sudo apt-get install build-essential
sudo apt install git -y

# optional: install a better terminal experience, otherwise skip to step 4
# 2 install brew
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
(echo; echo 'eval "$(/home/linuxbrew/.linuxbrew/bin/brew shellenv)"') >> /home/$USER/.bashrc
eval "$(/home/linuxbrew/.linuxbrew/bin/brew shellenv)"
brew doctor

# 3 install oh-my-posh
brew install jandedobbeleer/oh-my-posh/oh-my-posh
$(brew --prefix oh-my-posh)/themes
#	copy the path and add it below to the second eval line:
sudo nano ~/.bashrc
#	add this to the end:
#		eval "$(/home/linuxbrew/.linuxbrew/bin/brew shellenv)"
#		eval "$(oh-my-posh init bash --config '/home/linuxbrew/.linuxbrew/opt/oh-my-posh/themes/atomic.omp.json')"
#		   plugins=(
#			 git
#			 # other plugins
#		   )
#	CTRL+X to end editing
#	Y to save changes
#	ENTER to finally exit
source ~/.bashrc
exec bash

# 4 install mamba instead of conda, because it's faster https://mamba.readthedocs.io/en/latest/installation.html
mkdir github
mkdir downloads
cd downloads
wget https://github.com/conda-forge/miniforge/releases/latest/download/Mambaforge-Linux-x86_64.sh
bash Mambaforge-$(uname)-$(uname -m).sh

# 5 install the correct cuda toolkit 11.7, not 12.x
wget https://developer.download.nvidia.com/compute/cuda/11.7.0/local_installers/cuda_11.7.0_515.43.04_linux.run
sudo sh cuda_11.7.0_515.43.04_linux.run
naon ~/.bashrc
#	add the following line, in order to add the cuda library to the environment variable
#		export LD_LIBRARY_PATH=/usr/lib/wsl/lib:$LD_LIBRARY_PATH
#	after the plugins=() code block, above conda initialize
#	CTRL+X to end editing
#	Y to save changes
#	ENTER to finally exit
source ~/.bashrc
cd ..

# 6 install ooba's textgen
mamba create --name textgen python=3.10.9
mamba activate textgen
pip install torch==2.0.1+cu117 torchvision==0.15.2+cu117 torchaudio -f https://download.pytorch.org/whl/cu117/torch_stable.html
git clone https://github.com/oobabooga/text-generation-webui
cd text-generation-webui
pip install -r requirements.txt

# 7 Install 4bit support through GPTQ-for-LLaMa
mkdir repositories
cd repositories
# choose ONE of the following:
# A) for fast triton https://www.reddit.com/r/LocalLLaMA/comments/13g8v5q/fastest_inference_branch_of_gptqforllama_and/
	git clone https://github.com/qwopqwop200/GPTQ-for-LLaMa -b fastest-inference-4bit
# B) for triton
	git clone https://github.com/qwopqwop200/GPTQ-for-LLaMa -b triton
# C) for newer cuda
	git clone https://github.com/qwopqwop200/GPTQ-for-LLaMa -b cuda
# D) for widely compatible old cuda
	git clone https://github.com/oobabooga/GPTQ-for-LLaMa.git -b cuda
# groupsize, act-order, true-sequential
#	--act-order (quantizing columns in order of decreasing activation size)
#	--true-sequential (performing sequential quantization even within a single Transformer block)
#	Those fix GPTQ's strangely bad performance on the 7B model (from 7.15 to 6.09 Wiki2 PPL) and lead to slight improvements on most models/settings in general.
#	--groupsize
#	Currently, groupsize and act-order do not work together and you must choose one of them.
#	Ooba: There is a pytorch branch from qwop, that allows you to use groupsize and act-order together.
#	Models without group-size (better for the 7b model)
#	Models with group-size (better from 13b upwards)
cd GPTQ-for-LLaMa
pip install -r requirements.txt
python setup_cuda.py install
cd ..
cd ..

# 8 Test ooba with a 4bit GPTQ model
python download-model.py 4bit/WizardLM-13B-Uncensored-4bit-128g
python server.py --wbits 4 --model_type llama --groupsize 128 --chat

# 9 install llama.cpp
cd repositories
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp
nano ~/.bashrc
#	add the cuda bin folder to the path environment variable in order for make to find nvcc:
#		export PATH=/usr/local/cuda/bin:$PATH
#	after the export LD_LIBRARY_PATH line
#	CTRL+X to end editing
#	Y to save changes
#	ENTER to finally exit
source ~/.bashrc
make LLAMA_CUBLAS=1
cd models
wget https://huggingface.co/TheBloke/WizardLM-13B-Uncensored-GGML/resolve/main/wizardLM-13B-Uncensored.ggmlv3.q4_0.bin
cd ..

# 10 test llama.cpp with GPU support
./main -t 8 -m models/wizardLM-13B-Uncensored.ggmlv3.q4_0.bin --color -c 2048 --temp 0.7 --repeat_penalty 1.1 -n -1 -p "### Instruction: write a story about llamas ### Response:" --n-gpu-layers 30
cd ..
cd ..

# 11 prepare ooba's textgen for llama.cpp support, by compiling llama-cpp-python with cuda GPU support
pip uninstall -y llama-cpp-python
CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install llama-cpp-python --no-cache-dir

Windows 11 WSL2 Ubuntu / Native Ubuntu

Installation guide from 2023-03-01 (outdated)

Install Ubuntu WSL2 on Windows 11

Press the Windows key + X and click on "Windows PowerShell (Admin)" or "Windows Terminal (Admin)" to open PowerShell or Terminal with administrator privileges.
wsl --install You may be prompted to restart your computer. If so, save your work and restart.
Install Windows Terminal from Windows Store
Install Ubuntu on Windows Store
Choose the desired Ubuntu version (e.g., Ubuntu 20.04 LTS) and click "Get" or "Install" to download and install the Ubuntu app.
Once the installation is complete, click "Launch" or search for "Ubuntu" in the Start menu and open the app.
When you first launch the Ubuntu app, it will take a few minutes to set up. Be patient as it installs the necessary files and sets up your environment.
Once the setup is complete, you will be prompted to create a new UNIX username and password. Choose a username and password, and make sure to remember them, as you will need them for future administrative tasks within the Ubuntu environment.
If you prefer to use Windows Terminal from now on, close this console and start Windows Terminal then open a new Ubuntu console by clicking the drop down icon on top of Terminal and choose Ubuntu. Otherwise stay in the existing console window.

Install Anaconda + Build Essentials

sudo apt update
sudo apt upgrade
sudo apt install git
sudo apt install wget
mkdir downloads
cd downloads/
wget https://repo.anaconda.com/archive/Anaconda3-2023.03-1-Linux-x86_64.sh
chmod +x ./Anaconda3-2022.05-Linux-x86_64.sh
./Anaconda3-2022.05-Linux-x86_64.sh and follow the defaults
sudo apt install build-essential
cd ..

Install text-generation-webui

conda create -n textgen python=3.10.9
conda activate textgen
pip3 install torch torchvision torchaudio
mkdir github
cd github
git clone https://github.com/oobabooga/text-generation-webui
cd text-generation-webui
pip install -r requirements.txt
pip install chardet cchardet

Build and install GPTQ

If you want to try the triton branch, skip to Newer GPTQ-Triton

Older GPTQ-Cuda fork by pobabooga

Works on Windows, Linux, WSL2.
Supports 3 & 4 bit models
Only supports no-act-order models
Slower than triton
Works best with --groupsize 128 --wbits 4 and no-act-order models

mkdir repositories
cd repositories
git clone https://github.com/oobabooga/GPTQ-for-LLaMa.git -b cuda (or try the newer https://github.com/qwopqwop200/GPTQ-for-LLaMa/tree/cuda build)
cd GPTQ-for-LLaMa
python -m pip install -r requirements.txt
python setup_cuda.py install if this gives an error about g++, try installing the correct g++ version: conda install -y -k gxx_linux-64=11.2.0
cd ../..

Newer GPTQ-Triton

This triton branch or this one:

Works on Linux and WSL2
Supports 4 bit quantized models
Is faster than cuda
Works best with the --groupsize 128 --wbits 4 flags and act-order models

mkdir repositories
cd repositories
conda install pytorch torchvision torchaudio pytorch-cuda=11.7 -c pytorch -c nvidia
git clone https://github.com/qwopqwop200/GPTQ-for-LLaMa (or try https://github.com/fpgaminer/GPTQ-triton)
cd GPTQ-for-LLaMa
pip install -r requirements.txt
cd ../..

AutoGPTQ to install any (Newer Cuda, Newer Triton, older Cuda)

Alternatively you can try AutoGPTQ to install cuda, older llama-cuda, or triton variants:

run one of these:

pip install auto-gptq to install cuda branch for newer models
pip install auto-gptq[llama] if your transformers is outdated or you are using older models that don't support it
pip install auto-gptq[triton] to install triton branch for triton compatible models

cd ../..

LAN port forwarding from Ubuntu WSL

If you want to open the webui from within your home network, enable port forwarding on your windows machine, with this command in an administrator terminal: netsh interface portproxy add v4tov4 listenaddress=0.0.0.0 listenport=7860 connectaddress=localhost connectport=7860

Install bitsandbytes cuda

Either always run export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/lib/wsl/lib before running the sever.py below
Or trying to install pip install -i https://test.pypi.org/simple/ bitsandbytes-cuda113

Install xformers

Allows for faster, but non-deterministic inference. Optional:

pip install xformers
then use the --xformers flag later, when running the server.py below

You're done with the Ubuntu / WSL2 installation, you can skip to Download models section.

Windows 11 native

Install Miniconda

Download and install miniconda
Download and install git for windows
Open Anaconda Prompt (Miniconda 3) from the Start Menu

Install text-generation-webui

It should load in C:\Users\yourusername>
mkdir github
cd github
conda create --name textgen python=3.10
conda activate textgen
conda install pip
conda install -y -k pytorch[version=2,build=py3.10_cuda11.7*] torchvision torchaudio pytorch-cuda=11.7 cuda-toolkit ninja git -c pytorch -c nvidia/label/cuda-11.7.0 -c nvidia
git clone https://github.com/oobabooga/text-generation-webui.git
python -m pip install https://github.com/jllllll/bitsandbytes-windows-webui/raw/main/bitsandbytes-0.38.1-py3-none-any.whl
cd text-generation-webui
pip install -r requirements.txt --upgrade
mkdir repositories
cd repositories
git clone https://github.com/oobabooga/GPTQ-for-LLaMa.git -b cuda
python -m pip install -r requirements.txt
python setup_cuda.py install might fail, continue with the next command if so
pip install https://github.com/jllllll/GPTQ-for-LLaMa-Wheels/raw/main/quant_cuda-0.0.0-cp310-cp310-win_amd64.whl skip this command, if the previous one didn't fail
cd ..\..\..\ (go back to text-generation-webui)
pip install faust-cchardet
pip install chardet

Download models

Still in your terminal, make sure you are in the /text-generation-webui/ folder and type python download-model.py
select other to download a custom model
paste the huggingface user/directory, for example: TheBloke/wizardLM-7B-GGML and let it download the model files

Run

The base command to run. You have to add further flags, depending on the model and environment you want to run in:

if you are on WSL2 Ubuntu, run export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/lib/wsl/lib always, before running the server.py
python server.py --model-menu --chat

--model-menu to allow the change of models in the UI
--chat loads the chat instead of the text completion UI
--wbits 4 loads a 4-bit quantized model
--groupsize 128 if the model specifies groupsize, add this parameter
--model_type llama if the model name is unknown, specify it's base model. if you run llama derrived models like vicuna, alpaca, gpt4-x, codecapybara or wizardLM you have to define it as llama. If you load OPT or GPT-J models, define the flag accordingly
--xformers if you have properly installed xformers and want faster but nondeterministic answer generation

Troubleshoot

cuda lib not found

If you get a cuda lib not found error, especially on Windows WSL2 Ubuntu, try executing export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/lib/wsl/lib before running the server.py above

ModuleNotFoundError: No module named 'chardet'

pip install faust-cchardet
pip install chardet

or the other way around. Then try to start the server again.

No GPU support on bitsandbytes

On Windows Native, try:

pip uninstall bitsandbytes
pip install git+https://github.com/Keith-Hon/bitsandbytes-windows.git
here are some discussion, but some solutions are for Windows WSL2, some for Windows native

Or try these prebuilt wheel on windows:

https://github.com/TimDettmers/bitsandbytes/files/11084955/bitsandbytes-0.37.2-py3-none-any.whl.zip
https://github.com/acpopescu/bitsandbytes/releases/tag/v0.37.2-win.0
And more help on windows support here and here

Still having problems, try to manually copy the libraries

On Linux or Windows WSL2 Ubuntu, try:

make sure you run export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/lib/wsl/lib before running the server.py every time!
alternatively, you can try pip install -i https://test.pypi.org/simple/ bitsandbytes-cuda113 and see if it works without the above command

Install xformers prebuilt Windows wheels

pip install xformers==0.0.16rc425

Prebuilt GPTQ Windows Wheels (may be outdated)

GPTQ Wheels for Windows

Apple Silicon

Use llama.cpp, HN discussion

Resources

3rd party models

See an up to date list of most models you can run locally: awesome-ai open-models

Other tools

See the awesome-ai LLM section for more tools, GUIs etc.

Other resources

LocalLLaMA on Reddit
News about Llama
StackLLaMA: How to train LLaMA with RLHF
Reddit LocalLLaMA model card
Reddit Oobabooga's Textgen subreddit

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llama.md

llama.md

Text-generation-webui manual installation on Windows WSL2 / Ubuntu

Advanced WSL2 Ubuntu install 2023-05-15

Windows 11 WSL2 Ubuntu / Native Ubuntu

Install Ubuntu WSL2 on Windows 11

Install Anaconda + Build Essentials

Install text-generation-webui

Build and install GPTQ

Older GPTQ-Cuda fork by pobabooga

Newer GPTQ-Triton

AutoGPTQ to install any (Newer Cuda, Newer Triton, older Cuda)

LAN port forwarding from Ubuntu WSL

Install bitsandbytes cuda

Install xformers

Windows 11 native

Install Miniconda

Install text-generation-webui

Download models

Run

Troubleshoot

cuda lib not found

ModuleNotFoundError: No module named 'chardet'

No GPU support on bitsandbytes

Install xformers prebuilt Windows wheels

Prebuilt GPTQ Windows Wheels (may be outdated)

Apple Silicon

Resources

3rd party models

Other tools

Other resources

Files

llama.md

Latest commit

History

llama.md

File metadata and controls

Text-generation-webui manual installation on Windows WSL2 / Ubuntu

Advanced WSL2 Ubuntu install 2023-05-15

Windows 11 WSL2 Ubuntu / Native Ubuntu

Install Ubuntu WSL2 on Windows 11

Install Anaconda + Build Essentials

Install text-generation-webui

Build and install GPTQ

Older GPTQ-Cuda fork by pobabooga

Newer GPTQ-Triton

AutoGPTQ to install any (Newer Cuda, Newer Triton, older Cuda)

LAN port forwarding from Ubuntu WSL

Install bitsandbytes cuda

Install xformers

Windows 11 native

Install Miniconda

Install text-generation-webui

Download models

Run

Troubleshoot

cuda lib not found

ModuleNotFoundError: No module named 'chardet'

No GPU support on bitsandbytes

Install xformers prebuilt Windows wheels

Prebuilt GPTQ Windows Wheels (may be outdated)

Apple Silicon

Resources

3rd party models

Other tools

Other resources