Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vLLM model deployer #3032

Open
wants to merge 17 commits into
base: develop
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 9 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
74 changes: 74 additions & 0 deletions docs/book/component-guide/model-deployers/vllm.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
---
description: Deploying your LLM locally with vLLM.
---

# vLLM

[vLLM](https://docs.vllm.ai/en/latest/) is a fast and easy-to-use library for LLM inference and serving.

## When to use it?

You should use vLLM Model Deployer:
strickvl marked this conversation as resolved.
Show resolved Hide resolved
strickvl marked this conversation as resolved.
Show resolved Hide resolved

* Deploying Large Language models with state-of-the-art serving throughput creating an OpenAI-compatible API server
* Continuous batching of incoming requests
* Quantization: GPTQ, AWQ, INT4, INT8, and FP8
* Features such as PagedAttention, Speculative decoding, Chunked prefill
strickvl marked this conversation as resolved.
Show resolved Hide resolved
dudeperf3ct marked this conversation as resolved.
Show resolved Hide resolved

## How do you deploy it?

The vLLM Model Deployer flavor is provided by the vLLM ZenML integration, so you need to install it on your local machine to be able to deploy your models. You can do this by running the following command:

```bash
zenml integration install vllm -y
```

To register the vLLM model deployer with ZenML you need to run the following command:
hyperlint-ai[bot] marked this conversation as resolved.
Show resolved Hide resolved
strickvl marked this conversation as resolved.
Show resolved Hide resolved
strickvl marked this conversation as resolved.
Show resolved Hide resolved

```bash
zenml model-deployer register vllm_deployer --flavor=vllm
```

The ZenML integration will provision a local vLLM deployment server as a daemon process that will continue to run in the background to serve the latest vLLM model.

## How do you use it?

If you'd like to see this in action, check out this example of of a [deployment pipeline](https://github.com/zenml-io/zenml-projects/blob/79f67ea52c3908b9b33c9a41eef18cb7d72362e8/llm-vllm-deployer/pipelines/deploy_pipeline.py#L25).
dudeperf3ct marked this conversation as resolved.
Show resolved Hide resolved
dudeperf3ct marked this conversation as resolved.
Show resolved Hide resolved

### Deploy an LLM

The [vllm_model_deployer_step](https://github.com/zenml-io/zenml-projects/blob/79f67ea52c3908b9b33c9a41eef18cb7d72362e8/llm-vllm-deployer/steps/vllm_deployer.py#L32) exposes a `VLLMDeploymentService` that you can use in your pipeline. Here is an example snippet:
strickvl marked this conversation as resolved.
Show resolved Hide resolved
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here.


```python

from zenml import pipeline
from typing import Annotated
from steps.vllm_deployer import vllm_model_deployer_step
from zenml.integrations.vllm.services.vllm_deployment import VLLMDeploymentService


@pipeline()
def deploy_vllm_pipeline(
model: str,
timeout: int = 1200,
) -> Annotated[VLLMDeploymentService, "GPT2"]:
service = vllm_model_deployer_step(
model=model,
timeout=timeout,
)
return service
```

Here is an [example](https://github.com/zenml-io/zenml-projects/tree/79f67ea52c3908b9b33c9a41eef18cb7d72362e8/llm-vllm-deployer) of running a GPT-2 model using vLLM.

#### Configuration

Within the `VLLMDeploymentService` you can configure:

* `model`: Name or path of the Hugging Face model to use.
* `tokenizer`: Name or path of the Hugging Face tokenizer to use. If unspecified, model name or path will be used.
* `served_model_name`: The model name(s) used in the API. If not specified, the model name will be the same as the `model` argument.
* `trust_remote_code`: Trust remote code from Hugging Face.
* `tokenizer_mode`: The tokenizer mode. Allowed choices: ['auto', 'slow', 'mistral']
* `dtype`: Data type for model weights and activations. Allowed choices: ['auto', 'half', 'float16', 'bfloat16', 'float', 'float32']
* `revision`: The specific model version to use. It can be a branch name, a tag name, or a commit id. If unspecified, will use the default version.
4 changes: 3 additions & 1 deletion src/zenml/integrations/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -45,14 +45,15 @@
from zenml.integrations.label_studio import LabelStudioIntegration # noqa
from zenml.integrations.langchain import LangchainIntegration # noqa
from zenml.integrations.lightgbm import LightGBMIntegration # noqa

# from zenml.integrations.llama_index import LlamaIndexIntegration # noqa
from zenml.integrations.mlflow import MlflowIntegration # noqa
from zenml.integrations.neptune import NeptuneIntegration # noqa
from zenml.integrations.neural_prophet import NeuralProphetIntegration # noqa
from zenml.integrations.numpy import NumpyIntegration # noqa
from zenml.integrations.openai import OpenAIIntegration # noqa
from zenml.integrations.pandas import PandasIntegration # noqa
from zenml.integrations.pigeon import PigeonIntegration # noqa
from zenml.integrations.pigeon import PigeonIntegration # noqa
from zenml.integrations.pillow import PillowIntegration # noqa
from zenml.integrations.polars import PolarsIntegration # noqa
from zenml.integrations.prodigy import ProdigyIntegration # noqa
Expand All @@ -78,3 +79,4 @@
from zenml.integrations.wandb import WandbIntegration # noqa
from zenml.integrations.whylogs import WhylogsIntegration # noqa
from zenml.integrations.xgboost import XgboostIntegration # noqa
from zenml.integrations.vllm import VLLMIntegration # noqa
1 change: 1 addition & 0 deletions src/zenml/integrations/constants.py
Original file line number Diff line number Diff line change
Expand Up @@ -76,4 +76,5 @@
VERTEX = "vertex"
XGBOOST = "xgboost"
VAULT = "vault"
VLLM = "vllm"
LIGHTNING = "lightning"
50 changes: 50 additions & 0 deletions src/zenml/integrations/vllm/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
# Copyright (c) ZenML GmbH 2024. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at:
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express
# or implied. See the License for the specific language governing
# permissions and limitations under the License.
"""Initialization for the ZenML vLLM integration."""
from typing import List, Type
from zenml.integrations.integration import Integration
from zenml.stack import Flavor
from zenml.logger import get_logger
from zenml.integrations.constants import VLLM

VLLM_MODEL_DEPLOYER = "vllm"

logger = get_logger(__name__)


class VLLMIntegration(Integration):
"""Definition of vLLM integration for ZenML."""

NAME = VLLM

REQUIREMENTS = ["vllm>=0.6.0", "openai>=1.0.0"]

@classmethod
def activate(cls) -> None:
"""Activates the integration."""
from zenml.integrations.vllm import services

@classmethod
def flavors(cls) -> List[Type[Flavor]]:
"""Declare the stack component flavors for the vLLM integration.

Returns:
List of stack component flavors for this integration.
"""
from zenml.integrations.vllm.flavors import VLLMModelDeployerFlavor

return [VLLMModelDeployerFlavor]


VLLMIntegration.check_installation()
21 changes: 21 additions & 0 deletions src/zenml/integrations/vllm/flavors/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
# Copyright (c) ZenML GmbH 2024. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at:
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express
# or implied. See the License for the specific language governing
# permissions and limitations under the License.
"""vLLM integration flavors."""

from zenml.integrations.vllm.flavors.vllm_model_deployer_flavor import ( # noqa
VLLMModelDeployerConfig,
VLLMModelDeployerFlavor,
)

__all__ = ["VLLMModelDeployerConfig", "VLLMModelDeployerFlavor"]
91 changes: 91 additions & 0 deletions src/zenml/integrations/vllm/flavors/vllm_model_deployer_flavor.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,91 @@
# Copyright (c) ZenML GmbH 2024. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at:
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express
# or implied. See the License for the specific language governing
# permissions and limitations under the License.
"""vLLM model deployer flavor."""

from typing import TYPE_CHECKING, Optional, Type

from zenml.integrations.vllm import VLLM_MODEL_DEPLOYER
from zenml.model_deployers.base_model_deployer import (
BaseModelDeployerConfig,
BaseModelDeployerFlavor,
)

if TYPE_CHECKING:
from zenml.integrations.vllm.model_deployers import VLLMModelDeployer


class VLLMModelDeployerConfig(BaseModelDeployerConfig):
"""Configuration for vLLM Inference model deployer."""

service_path: str = ""


class VLLMModelDeployerFlavor(BaseModelDeployerFlavor):
"""vLLM model deployer flavor."""

@property
def name(self) -> str:
"""Name of the flavor.

Returns:
The name of the flavor.
"""
return VLLM_MODEL_DEPLOYER

@property
def docs_url(self) -> Optional[str]:
"""A url to point at docs explaining this flavor.

Returns:
A flavor docs url.
"""
return self.generate_default_docs_url()

@property
def sdk_docs_url(self) -> Optional[str]:
"""A url to point at SDK docs explaining this flavor.

Returns:
A flavor SDK docs url.
"""
return self.generate_default_sdk_docs_url()

@property
def logo_url(self) -> str:
"""A url to represent the flavor in the dashboard.

Returns:
The flavor logo.
"""
return "https://raw.githubusercontent.com/vllm-project/vllm/main/docs/source/assets/logos/vllm-logo-text-dark.png"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@schustmi can we please upload this logo to our s3 buckets in logo's path and share the URL here so it can be changed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


@property
def config_class(self) -> Type[VLLMModelDeployerConfig]:
"""Returns `VLLMModelDeployerConfig` config class.

Returns:
The config class.
"""
return VLLMModelDeployerConfig

@property
def implementation_class(self) -> Type["VLLMModelDeployer"]:
"""Implementation class for this flavor.

Returns:
The implementation class.
"""
from zenml.integrations.vllm.model_deployers import VLLMModelDeployer

return VLLMModelDeployer
19 changes: 19 additions & 0 deletions src/zenml/integrations/vllm/model_deployers/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
# Copyright (c) ZenML GmbH 2024. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at:
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express
# or implied. See the License for the specific language governing
# permissions and limitations under the License.
"""Initialization of the vLLM model deployers."""
from zenml.integrations.vllm.model_deployers.vllm_model_deployer import ( # noqa
VLLMModelDeployer,
)

__all__ = ["VLLMModelDeployer"]
Loading