-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Documentation update for 1.19 #597
base: habana_main
Are you sure you want to change the base?
Changes from 21 commits
a9b9e23
79edcf2
cb1ba00
03ef71a
087c304
36d872d
e04615d
79e37ad
0b5bf99
58be7bc
b8136a3
e19bd83
eb631ef
3b624ac
5f689dd
b2532a0
d8b7ae0
647c19c
3ecd3c0
213c716
d6bfbaf
ae2a931
71bc8e9
b138eb9
4f28b61
8d7fdbf
86ca464
85ecb4a
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -11,7 +11,7 @@ Please follow the instructions provided in the [Gaudi Installation Guide](https: | |
- OS: Ubuntu 22.04 LTS | ||
- Python: 3.10 | ||
- Intel Gaudi accelerator | ||
- Intel Gaudi software version 1.18.0 | ||
- Intel Gaudi software version 1.19.0 | ||
|
||
## Quick start using Dockerfile | ||
``` | ||
|
@@ -44,13 +44,29 @@ It is highly recommended to use the latest Docker image from Intel Gaudi vault. | |
Use the following commands to run a Docker image: | ||
|
||
```{.console} | ||
$ docker pull vault.habana.ai/gaudi-docker/1.18.0/ubuntu22.04/habanalabs/pytorch-installer-2.4.0:latest | ||
$ docker run -it --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --net=host --ipc=host vault.habana.ai/gaudi-docker/1.18.0/ubuntu22.04/habanalabs/pytorch-installer-2.4.0:latest | ||
$ docker pull vault.habana.ai/gaudi-docker/1.19.0/ubuntu22.04/habanalabs/pytorch-installer-2.5.1:latest | ||
$ docker run -it --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --net=host --ipc=host vault.habana.ai/gaudi-docker/1.19.0/ubuntu22.04/habanalabs/pytorch-installer-2.5.1:latest | ||
``` | ||
|
||
### Build and Install vLLM-fork | ||
### Build and Install vLLM | ||
|
||
Currently, the latest features and performance optimizations are developed in Gaudi's [vLLM-fork](https://github.com/HabanaAI/vllm-fork) and we periodically upstream them to vLLM main repo. To install latest [HabanaAI/vLLM-fork](https://github.com/HabanaAI/vllm-fork), run the following: | ||
Currently, we are providing multiple repositories which can be used to install vLLM with Intel® Gaudi®, pick one option: | ||
bartekkuncer marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
#### 1. Build and Install the stable version | ||
|
||
Periodically, we are releasing vLLM to allign with Intel® Gaudi® software releases. The stable version is released with a tagg, and supports fully validated features and performance optimizations in Gaudi's [vLLM-fork](https://github.com/HabanaAI/vllm-fork). To install the stable release from [HabanaAI/vLLM-fork](https://github.com/HabanaAI/vllm-fork), run the following: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. typo: tagg There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Corrected |
||
|
||
```{.console} | ||
$ git clone https://github.com/HabanaAI/vllm-fork.git | ||
$ cd vllm-fork | ||
$ git checkout v1.19.0 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Probably will need to replace with proper tag like in v0.5.3.post1+Gaudi-1.18.0 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @bartekkuncer please verify if that makes sense. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @michalkuligowski makes a good point especially that in release note we provide instruction with the use of a tag, so this change will make these two consistent. |
||
$ pip install -r requirements-hpu.txt | ||
$ python setup.py develop | ||
``` | ||
|
||
#### 2. Build and Install the latest from vLLM-fork | ||
|
||
The latest features and performance optimizations are developed in Gaudi's [vLLM-fork](https://github.com/HabanaAI/vllm-fork) and we periodically upstream them to vLLM main repo. To install latest [HabanaAI/vLLM-fork](https://github.com/HabanaAI/vllm-fork), run the following: | ||
|
||
```{.console} | ||
$ git clone https://github.com/HabanaAI/vllm-fork.git | ||
|
@@ -59,6 +75,16 @@ $ git checkout habana_main | |
$ pip install -r requirements-hpu.txt | ||
$ python setup.py develop | ||
``` | ||
#### 3. Build and Install from vLLM main source | ||
bartekkuncer marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
If you prefer to build and install directly from the main vLLM source, where periodically we are upstreaming new features, run the following: | ||
|
||
```{.console} | ||
$ git clone https://github.com/vllm-project/vllm.git | ||
$ cd vllm | ||
$ pip install -r requirements-hpu.txt | ||
$ python setup.py develop | ||
``` | ||
|
||
# Supported Features | ||
|
||
|
@@ -71,11 +97,11 @@ $ python setup.py develop | |
- Inference with [HPU Graphs](https://docs.habana.ai/en/latest/PyTorch/Inference_on_PyTorch/Inference_Using_HPU_Graphs.html) for accelerating low-batch latency and throughput | ||
- Attention with Linear Biases (ALiBi) | ||
- INC quantization | ||
- LoRA adapters | ||
|
||
# Unsupported Features | ||
|
||
- Beam search | ||
- LoRA adapters | ||
- AWQ quantization | ||
- Prefill chunking (mixed-batch inferencing) | ||
|
||
|
@@ -112,7 +138,7 @@ Currently in vLLM for HPU we support four execution modes, depending on selected | |
| 1 | 1 | PyTorch lazy mode | | ||
|
||
> [!WARNING] | ||
> In 1.18.0, all modes utilizing `PT_HPU_LAZY_MODE=0` are highly experimental and should be only used for validating functional correctness. Their performance will be improved in the next releases. For obtaining the best performance in 1.18.0, please use HPU Graphs, or PyTorch lazy mode. | ||
> In 1.19.0, all modes utilizing `PT_HPU_LAZY_MODE=0` are highly experimental and should be only used for validating functional correctness. Their performance will be improved in the next releases. For obtaining the best performance in 1.19.0, please use HPU Graphs, or PyTorch lazy mode. | ||
bartekkuncer marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
## Bucketing mechanism | ||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -18,7 +18,7 @@ Requirements | |
- OS: Ubuntu 22.04 LTS | ||
- Python: 3.10 | ||
- Intel Gaudi accelerator | ||
bartekkuncer marked this conversation as resolved.
Show resolved
Hide resolved
|
||
- Intel Gaudi software version 1.18.0 | ||
- Intel Gaudi software version 1.19.0 | ||
|
||
|
||
Quick start using Dockerfile | ||
|
@@ -63,23 +63,29 @@ Use the following commands to run a Docker image: | |
|
||
.. code:: console | ||
|
||
$ docker pull vault.habana.ai/gaudi-docker/1.18.0/ubuntu22.04/habanalabs/pytorch-installer-2.4.0:latest | ||
$ docker run -it --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --net=host --ipc=host vault.habana.ai/gaudi-docker/1.18.0/ubuntu22.04/habanalabs/pytorch-installer-2.4.0:latest | ||
$ docker pull vault.habana.ai/gaudi-docker/1.19.0/ubuntu22.04/habanalabs/pytorch-installer-2.5.1:latest | ||
$ docker run -it --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --net=host --ipc=host vault.habana.ai/gaudi-docker/1.19.0/ubuntu22.04/habanalabs/pytorch-installer-2.5.1:latest | ||
|
||
Build and Install vLLM | ||
~~~~~~~~~~~~~~~~~~~~~~ | ||
|
||
To build and install vLLM from source, run: | ||
Currently, we are providing multiple repositories which can be used to install vLLM with Intel® Gaudi®, pick one option: | ||
bartekkuncer marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
1. Build and Install the stable version | ||
|
||
Periodically, we are releasing vLLM to allign with Intel® Gaudi® software releases. The stable version is released with a tagg, and supports fully validated features and performance optimizations in Gaudi's [vLLM-fork](https://github.com/HabanaAI/vllm-fork). To install the stable release from [HabanaAI/vLLM-fork](https://github.com/HabanaAI/vllm-fork), run the following: | ||
|
||
.. code:: console | ||
|
||
$ git clone https://github.com/vllm-project/vllm.git | ||
$ cd vllm | ||
$ git clone https://github.com/HabanaAI/vllm-fork.git | ||
$ cd vllm-fork | ||
$ git checkout v1.19.0 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Probably will need to replace with proper tag like in v0.5.3.post1+Gaudi-1.18.0 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @bartekkuncer please verify if that makes sense. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @michalkuligowski makes a good point especially that in release note we provide instruction with the use of a tag, so this change will make these two consistent. |
||
$ pip install -r requirements-hpu.txt | ||
$ python setup.py develop | ||
|
||
2. Build and Install the latest from vLLM-fork | ||
|
||
Currently, the latest features and performance optimizations are developed in Gaudi's `vLLM-fork <https://github.com/HabanaAI/vllm-fork>`__ and we periodically upstream them to vLLM main repo. To install latest `HabanaAI/vLLM-fork <https://github.com/HabanaAI/vllm-fork>`__, run the following: | ||
The latest features and performance optimizations are developed in Gaudi's [vLLM-fork](https://github.com/HabanaAI/vllm-fork) and we periodically upstream them to vLLM main repo. To install latest [HabanaAI/vLLM-fork](https://github.com/HabanaAI/vllm-fork), run the following: | ||
|
||
.. code:: console | ||
|
||
|
@@ -89,6 +95,16 @@ Currently, the latest features and performance optimizations are developed in Ga | |
$ pip install -r requirements-hpu.txt | ||
$ python setup.py develop | ||
|
||
3. Build and Install from vLLM main source | ||
bartekkuncer marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
If you prefer to build and install directly from the main vLLM source, where periodically we are upstreaming new features, run the following: | ||
|
||
.. code:: console | ||
|
||
$ git clone https://github.com/vllm-project/vllm.git | ||
$ cd vllm | ||
$ pip install -r requirements-hpu.txt | ||
$ python setup.py develop | ||
|
||
Supported Features | ||
================== | ||
|
@@ -107,12 +123,12 @@ Supported Features | |
for accelerating low-batch latency and throughput | ||
- Attention with Linear Biases (ALiBi) | ||
- INC quantization | ||
- LoRA adapters | ||
|
||
Unsupported Features | ||
==================== | ||
|
||
- Beam search | ||
- LoRA adapters | ||
- AWQ quantization | ||
- Prefill chunking (mixed-batch inferencing) | ||
|
||
|
@@ -186,7 +202,7 @@ Currently in vLLM for HPU we support four execution modes, depending on selected | |
- PyTorch lazy mode | ||
|
||
.. warning:: | ||
In 1.18.0, all modes utilizing ``PT_HPU_LAZY_MODE=0`` are highly experimental and should be only used for validating functional correctness. Their performance will be improved in the next releases. For obtaining the best performance in 1.18.0, please use HPU Graphs, or PyTorch lazy mode. | ||
In 1.19.0, all modes utilizing ``PT_HPU_LAZY_MODE=0`` are highly experimental and should be only used for validating functional correctness. Their performance will be improved in the next releases. For obtaining the best performance in 1.19.0, please use HPU Graphs, or PyTorch lazy mode. | ||
bartekkuncer marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
|
||
Bucketing mechanism | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It needs more explanation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will you add it? @piotrbocian
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or explain what you have in mind so someone else can do it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other vendors seem to have it in the similar way as here:
https://docs.vllm.ai/en/latest/getting_started/openvino-installation.html#quick-start-using-dockerfile
https://docs.vllm.ai/en/latest/getting_started/cpu-installation.html#quick-start-using-dockerfile
https://docs.vllm.ai/en/latest/getting_started/arm-installation.html#quick-start-with-dockerfile
https://docs.vllm.ai/en/latest/getting_started/xpu-installation.html#quick-start-using-dockerfile
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please see how document is structured:
2.1 Environment verification
2.2 Run Docker Image
2.3 Build and Install vLLM
Questions: