-
Notifications
You must be signed in to change notification settings - Fork 640
Deepspeed Installation
robvanvolt edited this page May 1, 2021
·
8 revisions
You can also train with Microsoft Deepspeed's Sparse Attention, with any combination of dense and sparse attention that you'd like. However, you will have to endure the installation process.
- llvm-9-dev
- cmake
- gcc
- python3.7.x
- cudatoolkit=10.1
- pytorch=1.6.*
sudo apt-get -y install llvm-9-dev cmake
git clone https://github.com/microsoft/DeepSpeed.git /tmp/Deepspeed
cd /tmp/Deepspeed && DS_BUILD_SPARSE_ATTN=1 ./install.sh -s # Change this to -r if you need to run as root
pip install triton
cd ~
Then you may either use conda or pip
:
- Conda
#!/bin/bash
conda create -n dalle_env python=3.7
conda activate dalle_env
conda install pytorch==1.6.0 torchvision==0.7.0 cudatoolkit=10.1 -c pytorch
pip install "git+https://github.com:lucidrains/DALLE-pytorch.git"
- Pip
#!/bin/bash
python -m pip install virtualenv
python -m virtualenv -p=python3.7 ~/.virtualenvs/dalle_env
source ~/.virtualenvs/dalle_env/bin/activate
# Make sure your terminal shows that you're inside the virtual environment - and then run:
pip install torch==1.6.0+cu101 torchvision==0.7.0+cu101 -f https://download.pytorch.org/whl/torch_stable.html
pip install "git+https://github.com:lucidrains/DALLE-pytorch.git"
If all went well - continue to the following:
https://github.com/lucidrains/DALLE-pytorch/wiki/Deepspeed---Usage
If you want to get DeepSpeed's sparse attention running on the latest Nvidia GPUs (30XX RTX, A100), you have to install afaika87's specific version of DeepSpeed together with compatible versions of Torch and CUDA.
- Install the CUDA drivers, which are compatible to CUDA 10.1 (remove all previous Nvidia drivers and CUDA installations):
sudo apt install nvidia-driver-460
- Next install the compatible versions of torch and torchvision
pip install torch==1.8.1+cu111 torchvision==0.9.1+cu111 torchaudio==0.8.1 -f https://download.pytorch.org/whl/torch_stable.html
- Next, clone the DeepSpeed version of which is compatible with the latest Nvidia GPUs:
git clone --single-branch --branch sparse_triton_support https://github.com/afiaka87/DeepSpeed.git
- After that, you have to install DeepSpeed with the Sparse attention tag:
cd DeepSpeed;
DS_BUILD_SPARSE_ATTN=1 ./install.sh -s;
- If you want to use train_dally.py, you have to add a new parameter for sparse attention:
ATTN_TYPES = ('full', 'sparse')
SPARSE_ATTN = True
- Add the new constant to the Dall-E dictionary:
dalle_params = dict(
num_text_tokens=tokenizer.vocab_size,
text_seq_len=TEXT_SEQ_LEN,
dim=MODEL_DIM,
depth=DEPTH,
heads=HEADS,
dim_head=DIM_HEAD,
reversible=REVERSIBLE,
loss_img_weight=LOSS_IMG_WEIGHT,
attn_types = ATTN_TYPES,
sparse_attn=SPARSE_ATTN
)
- Finally, you are hopefully able to train faster with DeepSpeed's sparse attention!
deepspeed train_dalle.py --image_text_folder my-text-image-folder --truncate_captions=True --random_resize_crop_lower_ratio 0.9 --taming --fp16 --deepspeed
Note: You cannot use Zero optimization above 3 with sparse attention on.