Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tests_gaudi: Added L2 vllm workload #329

Merged
merged 1 commit into from
Dec 16, 2024
Merged

Conversation

vbedida79
Copy link
Contributor

@vbedida79 vbedida79 commented Oct 31, 2024

PR includes gaudi l2 vllm workload

Signed-off-by: vbedida79 [email protected]

@vbedida79 vbedida79 requested a review from uMartinXu October 31, 2024 17:01
@@ -74,4 +74,83 @@ Welcome to HCCL demo
[BENCHMARK] NW Bandwidth : 258.209121 GB/s
[BENCHMARK] Algo Bandwidth : 147.548069 GB/s
####################################################################################################
```

## VLLM
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

vLLM

```

## VLLM
VLLM is a serving engine for LLM's. The following workloads deploys a VLLM server with an LLM using Intel Gaudi. Refer to [Intel Gaudi VLLM fork](https://github.com/HabanaAI/vllm-fork.git) for more details.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

vLLM

Build the workload container image:
```
$ oc apply -f https://raw.githubusercontent.com/intel/intel-technology-enabling-for-openshift/main/tests/gaudi/l2/vllm_buildconfig.yaml
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we add the instruction to let user know whether the building is success. :-)

```
Deploy the workload:
* Update the hugging face token and the pvc according to your cluster setup
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we have some detail about setting the hugging face token? and also give some brief introduction about what model we are using. :-)

runPolicy: "Serial"
source:
git:
uri: https://github.com/opendatahub-io/vllm.git
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After comparing

  1. https://github.com/opendatahub-io/vllm.git - ODH fork vllM
  2. https://github.com/vllm-project/vllm - vLLM upstream
    3.https://github.com/HabanaAI/vllm-fork - Habana fork vLLM
    I think currently we should start from use the 3. with the change in 1 (adding the ubi based docker file for RH OpenShift), and obviously the Intel are upstreaming from 3 to 2. So in the long run we will using 2.
    So I think we need to 1). submit a PR to adding the ubi based docker file for RH, and also add the RH 9.4 support into the documents, and then 2). using repo 3 3) I think the owner of 3 will also help to upstream the ubi based docker file and doc to 2. 4) after that we can switch to use 2 the upstream vLLM.
    @vbedida79 any comments? :-)

Copy link
Contributor Author

@vbedida79 vbedida79 Dec 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HabanaAI/vllm-fork#190 PR for ubi from RH when merged into vllm gaudi fork repo, we can use that directly. Currently we can use the ubi image by RH maintained in https://github.com/opendatahub-io/vllm.git from https://github.com/HabanaAI/vllm-fork

@vbedida79
Copy link
Contributor Author

Updated according to comments, please review. thanks

@vbedida79 vbedida79 force-pushed the patch-301024-1 branch 3 times, most recently from 46ef40e to 462c42d Compare December 16, 2024 18:31
@@ -75,3 +75,104 @@ Welcome to HCCL demo
[BENCHMARK] Algo Bandwidth : 147.548069 GB/s
####################################################################################################
```
<<<<<<< HEAD
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR git comments: the buildconfig base on HabanaAI/vllm-fork#602

Copy link
Contributor Author

@vbedida79 vbedida79 Dec 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, updated in PR and git commit

Build the workload container image:
```
git clone https://github.com/opendatahub-io/vllm.git --branch gaudi-main

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should use 1.18.0 branch ?

Copy link
Contributor Author

@vbedida79 vbedida79 Dec 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

- containerPort: 8000
resources:
limits:
habana.ai/gaudi: 4
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we check and confirm how many Accelerators are actually used by vLLM?
I suggest to start from using only single Accelerator.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can check for 1 resource and update

@vbedida79 vbedida79 force-pushed the patch-301024-1 branch 2 times, most recently from 5f35eab to 4e68146 Compare December 16, 2024 22:09
vllm gaudi ubi image based on PR HabanaAI/vllm-fork#602

Signed-off-by: vbedida79 <[email protected]>
@uMartinXu uMartinXu merged commit 8d5c358 into intel:main Dec 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants