Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can openllm support local path model? #1044

Open
dsp6414 opened this issue Jul 18, 2024 · 12 comments
Open

Can openllm support local path model? #1044

dsp6414 opened this issue Jul 18, 2024 · 12 comments

Comments

@dsp6414
Copy link

dsp6414 commented Jul 18, 2024

how can i use openllm for local lora model?

@dsp6414 dsp6414 closed this as completed Jul 19, 2024
@dsp6414
Copy link
Author

dsp6414 commented Jul 21, 2024

openllm 部署

  1. 安装openllm
    pip install openllm
  2. 安装bentoml
    pip install bentoml
  3. 更新openllm repo
    openllm repo update

4.创建venv 虚拟环境
python -m uv venv /home/tcx/.openllm/venv/998690274545817638

5.激活venv虚拟环境
source /home/tcx/.openllm/venv/998690274545817638/bin/activate

6.安装依赖项
python -m uv pip install -p /home/tcx/.openllm/venv/998690274545817638/bin/python -r /home/tcx/.openllm/venv/998690274545817638/requirements.txt

  1. huggingface克隆模型仓库
    https://huggingface.co/Qwen/Qwen2-0.5B-Instruct
    本地存放目录
    /home/tcx/bentoml/models/Qwen/Qwen2-0.5B-Instruct

  2. 更新模型仓库参数
    /home/tcx/.openllm/repos/github.com/bentoml/openllm-models/main/bentoml/bentos/qwen2/0.5b-instruct-fp16-fcc6

src/bentofile.yaml 更新如下
conda:
channels: null
dependencies: null
environment_yml: null
pip: null
description: null
docker:
base_image: null
cuda_version: null
distro: debian
dockerfile_template: null
env:
HF_TOKEN: ''
python_version: '3.9'
setup_script: null
system_packages: null
envs:

  • name: HF_TOKEN
    exclude: []
    include:
  • '*.py'
  • ui/*
  • ui/chunks/*
  • ui/css/*
  • ui/media/*
  • ui/chunks/pages/*
  • bentovllm_openai/*.py
  • chat_templates/chat_templates/*.jinja
  • chat_templates/generation_configs/*.json
    labels:
    model_name: /home/tcx/bentoml/models/Qwen/Qwen2-0.5B-Instruct
    openllm_alias: 0.5b,0.5b-instruct
    platforms: linux
    source: https://github.com/bentoml/openllm-models-feed/tree/main/source/vllm-chat
    models: []
    name: null
    python:
    extra_index_url: null
    find_links: null
    index_url: null
    lock_packages: true
    no_index: null
    pack_git_packages: true
    packages: null
    pip_args: null
    requirements_txt: ./requirements.txt
    trusted_host: null
    wheels: null
    service: service:VLLM

bento_constants.py 更新如下

CONSTANT_YAML = '''
engine_config:
dtype: half
max_model_len: 2048
model: /home/tcx/bentoml/models/Qwen/Qwen2-0.5B-Instruct
extra_labels:
model_name: /home/tcx/bentoml/models/Qwen/Qwen2-0.5B-Instruct
openllm_alias: 0.5b,0.5b-instruct
project: vllm-chat
service_config:
name: qwen2
resources:
gpu: 1
gpu_type: nvidia-rtx-3060
traffic:
timeout: 300

'''

bento.yaml 更新如下

service: service:VLLM
name: qwen2
version: 0.5b-instruct-fp16-fcc6
bentoml_version: 1.2.20
creation_time: '2024-07-12T14:16:26.873508+00:00'
labels:
model_name: /home/tcx/bentoml/models/Qwen/Qwen2-0.5B-Instruct
openllm_alias: 0.5b,0.5b-instruct
platforms: linux
source: https://github.com/bentoml/openllm-models-feed/tree/main/source/vllm-chat
models: []
runners: []
entry_service: qwen2
services:

  • name: qwen2
    service: ''
    models: []
    dependencies: []
    config:
    name: qwen2
    resources:
    gpu: 1
    gpu_type: nvidia-rtx-3060
    traffic:
    timeout: 300
    envs:
  • name: HF_TOKEN
    schema:
    name: qwen2
    type: service
    routes:
    • name: chat
      route: /api/chat
      batchable: false
      input:
      properties:
      messages:
      default:
      - role: user
      content: what is the meaning of life?
      items:
      properties:
      role:
      enum:
      - system
      - user
      - assistant
      title: Role
      type: string
      content:
      title: Content
      type: string
      required:
      - role
      - content
      title: Message
      type: object
      title: Messages
      type: array
      model:
      default: /home/tcx/bentoml/models/Qwen/Qwen2-0.5B-Instruct
      title: Model
      type: string
      max_tokens:
      default: 2048
      maximum: 2048
      minimum: 128
      title: Max Tokens
      type: integer
      stop:
      default: null
      title: Stop
      items:
      type: string
      type: array
      title: Input
      type: object
      output:
      title: strIODescriptor
      type: string
      is_stream: true
      media_type: text/event-stream
    • name: generate
      route: /api/generate
      batchable: false
      input:
      properties:
      prompt:
      default: Explain superconductors like I'm five years old
      title: Prompt
      type: string
      model:
      default: /home/tcx/bentoml/models/Qwen/Qwen2-0.5B-Instruct
      title: Model
      type: string
      max_tokens:
      default: 2048
      maximum: 2048
      minimum: 128
      title: Max Tokens
      type: integer
      stop:
      default: null
      title: Stop
      items:
      type: string
      type: array
      title: Input
      type: object
      output:
      title: strIODescriptor
      type: string
      is_stream: true
      media_type: text/event-stream
      apis: []
      docker:
      distro: debian
      python_version: '3.9'
      cuda_version: null
      env:
      HF_TOKEN: ''
      system_packages: null
      setup_script: null
      base_image: null
      dockerfile_template: null
      python:
      requirements_txt: ./requirements.txt
      packages: null
      lock_packages: true
      pack_git_packages: true
      index_url: null
      no_index: null
      trusted_host: null
      find_links: null
      extra_index_url: null
      pip_args: null
      wheels: null
      conda:
      environment_yml: null
      channels: null
      dependencies: null
      pip: null

9.启动venv虚拟环境,运行命令
进入/home/tcx/.openllm/repos/github.com/bentoml/openllm-models/main/bentoml/bentos/qwen2/0.5b-instruct-fp16-fcc6/src目录执行命令

$ export BENTOML_HOME=/home/tcx/.openllm/repos/github.com/bentoml/openllm-models/main/bentoml
$ source /home/tcx/.openllm/venv/998690274545817638/bin/activate
$ bentoml serve qwen2:0.5b-instruct-fp16-fcc6

或者
bentoml serve .

10 如果端口被占用,执行如下命令
netstat -tulnp | grep 3000
sudo kill -9 进程号

@dsp6414 dsp6414 reopened this Jul 21, 2024
@dsp6414
Copy link
Author

dsp6414 commented Jul 21, 2024

1

@bojiang
Copy link
Member

bojiang commented Jul 22, 2024

It seems that you have a solution step by step. Anything we can help?

@dsp6414
Copy link
Author

dsp6414 commented Jul 22, 2024

It seems that you have a solution step by step. Anything we can help?

I still do not know how to load a Lora fine-tuning model or where to modify the yaml file.

@aarnphm
Copy link
Collaborator

aarnphm commented Jul 22, 2024

I don't think have loading lora supported yet, but we can add this @bojiang

@bojiang
Copy link
Member

bojiang commented Jul 24, 2024

As for local path model, I think we can support it.

@dsp6414
Copy link
Author

dsp6414 commented Jul 24, 2024

thanks🌺

@dsp6414
Copy link
Author

dsp6414 commented Jul 30, 2024

openllm-models service.py

vllm_api_server.openai_serving_chat = OpenAIServingChat(
engine=self.engine,
served_model_names=[ENGINE_CONFIG["model"]],
response_role="assistant",
chat_template=chat_template,
model_config=model_config,
lora_modules=None,
prompt_adapters=None,
request_logger=None,
)
vllm_api_server.openai_serving_completion = OpenAIServingCompletion(
engine=self.engine,
served_model_names=[ENGINE_CONFIG["model"]],
model_config=model_config,
lora_modules=None,
prompt_adapters=None,
request_logger=None,
)

both set lora_modules=None,
how to set my lora model?

@dsp6414 dsp6414 closed this as completed Aug 1, 2024
@dsp6414
Copy link
Author

dsp6414 commented Aug 1, 2024

🌼

@bojiang bojiang reopened this Aug 2, 2024
@dsp6414
Copy link
Author

dsp6414 commented Aug 2, 2024

@aarnphm
Copy link
Collaborator

aarnphm commented Sep 24, 2024

Hi there, I think this involves a larger design discussion that we are currently working internally.

Will update more once we have more details. Thanks for your patience

@dsp6414
Copy link
Author

dsp6414 commented Sep 24, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants