Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Documentation update for 1.19 #597

Open
wants to merge 28 commits into
base: habana_main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 21 commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
a9b9e23
Update ray_hpu_executor.py (#541)
michalkuligowski Nov 25, 2024
79edcf2
fix flags setting on HPU for FP8LinearMethod
nirda7 Nov 26, 2024
cb1ba00
Update sh scripts (#546)
michalkuligowski Nov 26, 2024
03ef71a
Update cpu-test.yml (#543)
michalkuligowski Nov 26, 2024
087c304
Fix flags setting on HPU for FP8LinearMethod (#551)
nirda7 Nov 26, 2024
36d872d
Revert "Fix flags setting on HPU for FP8LinearMethod" (#553)
michalkuligowski Nov 26, 2024
e04615d
Update run-lm-eval-gsm-vllm-baseline.sh (#554)
michalkuligowski Nov 26, 2024
79e37ad
1.19.0 fast-forward merge (#542)
michalkuligowski Nov 26, 2024
0b5bf99
Update documentation
michalkuligowski Nov 26, 2024
58be7bc
Update README_GAUDI.md
michalkuligowski Nov 26, 2024
b8136a3
Update gaudi-installation.rst
michalkuligowski Nov 26, 2024
e19bd83
Update compatibility_matrix.rst
michalkuligowski Nov 26, 2024
eb631ef
Update compatibility_matrix.rst
michalkuligowski Nov 26, 2024
3b624ac
Update gaudi-installation.rst
michalkuligowski Nov 26, 2024
5f689dd
Update compatibility_matrix.rst
michalkuligowski Nov 26, 2024
b2532a0
Update compatibility_matrix.rst
michalkuligowski Nov 27, 2024
d8b7ae0
Update Dockerfile.hpu
michalkuligowski Nov 27, 2024
647c19c
Update README_GAUDI.md
michalkuligowski Nov 27, 2024
3ecd3c0
Update gaudi-installation.rst
michalkuligowski Nov 27, 2024
213c716
Update README_GAUDI.md
michalkuligowski Nov 27, 2024
d6bfbaf
Updates to the documentation
PatrykWo Dec 5, 2024
ae2a931
Update README_GAUDI.md v2
PatrykWo Dec 5, 2024
71bc8e9
Merge branch 'HabanaAI:habana_main' into doc_update
PatrykWo Dec 5, 2024
b138eb9
Update README_GAUDI.md v3
PatrykWo Dec 5, 2024
4f28b61
Update gaudi-installation.rst
PatrykWo Dec 5, 2024
8d7fdbf
Apply suggestions from code review
bartekkuncer Dec 19, 2024
86ca464
Add tag to gaudi-installation.rst
bartekkuncer Dec 19, 2024
85ecb4a
Update tag
bartekkuncer Dec 19, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion Dockerfile.hpu
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
FROM vault.habana.ai/gaudi-docker/1.18.0/ubuntu22.04/habanalabs/pytorch-installer-2.4.0:latest
FROM vault.habana.ai/gaudi-docker/1.19.0/ubuntu22.04/habanalabs/pytorch-installer-2.5.1:latest

COPY ./ /workspace/vllm

Expand Down
40 changes: 33 additions & 7 deletions README_GAUDI.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ Please follow the instructions provided in the [Gaudi Installation Guide](https:
- OS: Ubuntu 22.04 LTS
- Python: 3.10
- Intel Gaudi accelerator
- Intel Gaudi software version 1.18.0
- Intel Gaudi software version 1.19.0

## Quick start using Dockerfile
```

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It needs more explanation.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will you add it? @piotrbocian

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or explain what you have in mind so someone else can do it?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see how document is structured:

  1. Quick start using Dockerfile
  2. Build from source
    2.1 Environment verification
    2.2 Run Docker Image
    2.3 Build and Install vLLM

Questions:

  • is (1.) full alternative to (2.)? If so, I would add one liner as
  • "You can quickly set up vLLM using latest Intel Gaudi docker and vllm verson "
  • is (2.1 Env verification) common to (1.) and (2.)?

Expand Down Expand Up @@ -44,13 +44,29 @@ It is highly recommended to use the latest Docker image from Intel Gaudi vault.
Use the following commands to run a Docker image:

```{.console}
$ docker pull vault.habana.ai/gaudi-docker/1.18.0/ubuntu22.04/habanalabs/pytorch-installer-2.4.0:latest
$ docker run -it --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --net=host --ipc=host vault.habana.ai/gaudi-docker/1.18.0/ubuntu22.04/habanalabs/pytorch-installer-2.4.0:latest
$ docker pull vault.habana.ai/gaudi-docker/1.19.0/ubuntu22.04/habanalabs/pytorch-installer-2.5.1:latest
$ docker run -it --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --net=host --ipc=host vault.habana.ai/gaudi-docker/1.19.0/ubuntu22.04/habanalabs/pytorch-installer-2.5.1:latest
```

### Build and Install vLLM-fork
### Build and Install vLLM

Currently, the latest features and performance optimizations are developed in Gaudi's [vLLM-fork](https://github.com/HabanaAI/vllm-fork) and we periodically upstream them to vLLM main repo. To install latest [HabanaAI/vLLM-fork](https://github.com/HabanaAI/vllm-fork), run the following:
Currently, we are providing multiple repositories which can be used to install vLLM with Intel® Gaudi®, pick one option:
bartekkuncer marked this conversation as resolved.
Show resolved Hide resolved

#### 1. Build and Install the stable version

Periodically, we are releasing vLLM to allign with Intel® Gaudi® software releases. The stable version is released with a tagg, and supports fully validated features and performance optimizations in Gaudi's [vLLM-fork](https://github.com/HabanaAI/vllm-fork). To install the stable release from [HabanaAI/vLLM-fork](https://github.com/HabanaAI/vllm-fork), run the following:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: tagg

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Corrected


```{.console}
$ git clone https://github.com/HabanaAI/vllm-fork.git
$ cd vllm-fork
$ git checkout v1.19.0

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably will need to replace with proper tag like in v0.5.3.post1+Gaudi-1.18.0

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bartekkuncer please verify if that makes sense.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@michalkuligowski makes a good point especially that in release note we provide instruction with the use of a tag, so this change will make these two consistent.

$ pip install -r requirements-hpu.txt
$ python setup.py develop
```

#### 2. Build and Install the latest from vLLM-fork

The latest features and performance optimizations are developed in Gaudi's [vLLM-fork](https://github.com/HabanaAI/vllm-fork) and we periodically upstream them to vLLM main repo. To install latest [HabanaAI/vLLM-fork](https://github.com/HabanaAI/vllm-fork), run the following:

```{.console}
$ git clone https://github.com/HabanaAI/vllm-fork.git
Expand All @@ -59,6 +75,16 @@ $ git checkout habana_main
$ pip install -r requirements-hpu.txt
$ python setup.py develop
```
#### 3. Build and Install from vLLM main source
bartekkuncer marked this conversation as resolved.
Show resolved Hide resolved

If you prefer to build and install directly from the main vLLM source, where periodically we are upstreaming new features, run the following:

```{.console}
$ git clone https://github.com/vllm-project/vllm.git
$ cd vllm
$ pip install -r requirements-hpu.txt
$ python setup.py develop
```

# Supported Features

Expand All @@ -71,11 +97,11 @@ $ python setup.py develop
- Inference with [HPU Graphs](https://docs.habana.ai/en/latest/PyTorch/Inference_on_PyTorch/Inference_Using_HPU_Graphs.html) for accelerating low-batch latency and throughput
- Attention with Linear Biases (ALiBi)
- INC quantization
- LoRA adapters

# Unsupported Features

- Beam search
- LoRA adapters
- AWQ quantization
- Prefill chunking (mixed-batch inferencing)

Expand Down Expand Up @@ -112,7 +138,7 @@ Currently in vLLM for HPU we support four execution modes, depending on selected
| 1 | 1 | PyTorch lazy mode |

> [!WARNING]
> In 1.18.0, all modes utilizing `PT_HPU_LAZY_MODE=0` are highly experimental and should be only used for validating functional correctness. Their performance will be improved in the next releases. For obtaining the best performance in 1.18.0, please use HPU Graphs, or PyTorch lazy mode.
> In 1.19.0, all modes utilizing `PT_HPU_LAZY_MODE=0` are highly experimental and should be only used for validating functional correctness. Their performance will be improved in the next releases. For obtaining the best performance in 1.19.0, please use HPU Graphs, or PyTorch lazy mode.
bartekkuncer marked this conversation as resolved.
Show resolved Hide resolved

## Bucketing mechanism

Expand Down
34 changes: 25 additions & 9 deletions docs/source/getting_started/gaudi-installation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ Requirements
- OS: Ubuntu 22.04 LTS
- Python: 3.10
- Intel Gaudi accelerator
bartekkuncer marked this conversation as resolved.
Show resolved Hide resolved
- Intel Gaudi software version 1.18.0
- Intel Gaudi software version 1.19.0


Quick start using Dockerfile
Expand Down Expand Up @@ -63,23 +63,29 @@ Use the following commands to run a Docker image:

.. code:: console

$ docker pull vault.habana.ai/gaudi-docker/1.18.0/ubuntu22.04/habanalabs/pytorch-installer-2.4.0:latest
$ docker run -it --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --net=host --ipc=host vault.habana.ai/gaudi-docker/1.18.0/ubuntu22.04/habanalabs/pytorch-installer-2.4.0:latest
$ docker pull vault.habana.ai/gaudi-docker/1.19.0/ubuntu22.04/habanalabs/pytorch-installer-2.5.1:latest
$ docker run -it --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --net=host --ipc=host vault.habana.ai/gaudi-docker/1.19.0/ubuntu22.04/habanalabs/pytorch-installer-2.5.1:latest

Build and Install vLLM
~~~~~~~~~~~~~~~~~~~~~~

To build and install vLLM from source, run:
Currently, we are providing multiple repositories which can be used to install vLLM with Intel® Gaudi®, pick one option:
bartekkuncer marked this conversation as resolved.
Show resolved Hide resolved

1. Build and Install the stable version

Periodically, we are releasing vLLM to allign with Intel® Gaudi® software releases. The stable version is released with a tagg, and supports fully validated features and performance optimizations in Gaudi's [vLLM-fork](https://github.com/HabanaAI/vllm-fork). To install the stable release from [HabanaAI/vLLM-fork](https://github.com/HabanaAI/vllm-fork), run the following:

.. code:: console

$ git clone https://github.com/vllm-project/vllm.git
$ cd vllm
$ git clone https://github.com/HabanaAI/vllm-fork.git
$ cd vllm-fork
$ git checkout v1.19.0

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably will need to replace with proper tag like in v0.5.3.post1+Gaudi-1.18.0

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bartekkuncer please verify if that makes sense.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@michalkuligowski makes a good point especially that in release note we provide instruction with the use of a tag, so this change will make these two consistent.

$ pip install -r requirements-hpu.txt
$ python setup.py develop

2. Build and Install the latest from vLLM-fork

Currently, the latest features and performance optimizations are developed in Gaudi's `vLLM-fork <https://github.com/HabanaAI/vllm-fork>`__ and we periodically upstream them to vLLM main repo. To install latest `HabanaAI/vLLM-fork <https://github.com/HabanaAI/vllm-fork>`__, run the following:
The latest features and performance optimizations are developed in Gaudi's [vLLM-fork](https://github.com/HabanaAI/vllm-fork) and we periodically upstream them to vLLM main repo. To install latest [HabanaAI/vLLM-fork](https://github.com/HabanaAI/vllm-fork), run the following:

.. code:: console

Expand All @@ -89,6 +95,16 @@ Currently, the latest features and performance optimizations are developed in Ga
$ pip install -r requirements-hpu.txt
$ python setup.py develop

3. Build and Install from vLLM main source
bartekkuncer marked this conversation as resolved.
Show resolved Hide resolved

If you prefer to build and install directly from the main vLLM source, where periodically we are upstreaming new features, run the following:

.. code:: console

$ git clone https://github.com/vllm-project/vllm.git
$ cd vllm
$ pip install -r requirements-hpu.txt
$ python setup.py develop

Supported Features
==================
Expand All @@ -107,12 +123,12 @@ Supported Features
for accelerating low-batch latency and throughput
- Attention with Linear Biases (ALiBi)
- INC quantization
- LoRA adapters

Unsupported Features
====================

- Beam search
- LoRA adapters
- AWQ quantization
- Prefill chunking (mixed-batch inferencing)

Expand Down Expand Up @@ -186,7 +202,7 @@ Currently in vLLM for HPU we support four execution modes, depending on selected
- PyTorch lazy mode

.. warning::
In 1.18.0, all modes utilizing ``PT_HPU_LAZY_MODE=0`` are highly experimental and should be only used for validating functional correctness. Their performance will be improved in the next releases. For obtaining the best performance in 1.18.0, please use HPU Graphs, or PyTorch lazy mode.
In 1.19.0, all modes utilizing ``PT_HPU_LAZY_MODE=0`` are highly experimental and should be only used for validating functional correctness. Their performance will be improved in the next releases. For obtaining the best performance in 1.19.0, please use HPU Graphs, or PyTorch lazy mode.
bartekkuncer marked this conversation as resolved.
Show resolved Hide resolved


Bucketing mechanism
Expand Down
16 changes: 16 additions & 0 deletions docs/source/serving/compatibility_matrix.rst
Original file line number Diff line number Diff line change
Expand Up @@ -305,6 +305,7 @@ Feature x Hardware
- Hopper
- CPU
- AMD
- Gaudi
* - :ref:`CP <chunked-prefill>`
- `✗ <https://github.com/vllm-project/vllm/issues/2729>`__
- ✅
Expand All @@ -313,6 +314,7 @@ Feature x Hardware
- ✅
- ✗
- ✅
- ✗
* - :ref:`APC <apc>`
- `✗ <https://github.com/vllm-project/vllm/issues/3687>`__
- ✅
Expand All @@ -321,6 +323,7 @@ Feature x Hardware
- ✅
- ✗
- ✅
- ✅
* - :ref:`LoRA <lora>`
- ✅
- ✅
Expand All @@ -329,6 +332,7 @@ Feature x Hardware
- ✅
- `✗ <https://github.com/vllm-project/vllm/pull/4830>`__
- ✅
- ✅
* - :abbr:`prmpt adptr (Prompt Adapter)`
- ✅
- ✅
Expand All @@ -337,6 +341,7 @@ Feature x Hardware
- ✅
- `✗ <https://github.com/vllm-project/vllm/issues/8475>`__
- ✅
- ✗
* - :ref:`SD <spec_decode>`
- ✅
- ✅
Expand All @@ -345,6 +350,7 @@ Feature x Hardware
- ✅
- ✅
- ✅
- ✅
* - CUDA graph
- ✅
- ✅
Expand All @@ -353,6 +359,7 @@ Feature x Hardware
- ✅
- ✗
- ✅
- ✗
* - :abbr:`enc-dec (Encoder-Decoder Models)`
- ✅
- ✅
Expand All @@ -361,6 +368,7 @@ Feature x Hardware
- ✅
- ✅
- ✗
- ✅
* - :abbr:`logP (Logprobs)`
- ✅
- ✅
Expand All @@ -369,6 +377,7 @@ Feature x Hardware
- ✅
- ✅
- ✅
- ✅
* - :abbr:`prmpt logP (Prompt Logprobs)`
- ✅
- ✅
Expand All @@ -377,6 +386,7 @@ Feature x Hardware
- ✅
- ✅
- ✅
- ✅
* - :abbr:`async output (Async Output Processing)`
- ✅
- ✅
Expand All @@ -385,6 +395,7 @@ Feature x Hardware
- ✅
- ✗
- ✗
- ✅
* - multi-step
- ✅
- ✅
Expand All @@ -393,6 +404,7 @@ Feature x Hardware
- ✅
- `✗ <https://github.com/vllm-project/vllm/issues/8477>`__
- ✅
- ✅
* - :abbr:`MM (Multimodal)`
- ✅
- ✅
Expand All @@ -401,6 +413,7 @@ Feature x Hardware
- ✅
- ✅
- ✅
- ✅
* - best-of
- ✅
- ✅
Expand All @@ -409,6 +422,7 @@ Feature x Hardware
- ✅
- ✅
- ✅
- ✅
* - beam-search
- ✅
- ✅
Expand All @@ -417,6 +431,7 @@ Feature x Hardware
- ✅
- ✅
- ✅
- ✗
* - :abbr:`guided dec (Guided Decoding)`
- ✅
- ✅
Expand All @@ -425,3 +440,4 @@ Feature x Hardware
- ✅
- ✅
- ✅
- ✅
Loading