Can openllm support local path model? #1044

dsp6414 · 2024-07-18T21:23:31Z

how can i use openllm for local lora model?

dsp6414 · 2024-07-21T10:52:12Z

openllm 部署

安装openllm
pip install openllm
安装bentoml
pip install bentoml
更新openllm repo
openllm repo update

4.创建venv 虚拟环境
python -m uv venv /home/tcx/.openllm/venv/998690274545817638

5.激活venv虚拟环境
source /home/tcx/.openllm/venv/998690274545817638/bin/activate

6.安装依赖项
python -m uv pip install -p /home/tcx/.openllm/venv/998690274545817638/bin/python -r /home/tcx/.openllm/venv/998690274545817638/requirements.txt

huggingface克隆模型仓库
https://huggingface.co/Qwen/Qwen2-0.5B-Instruct
本地存放目录
/home/tcx/bentoml/models/Qwen/Qwen2-0.5B-Instruct
更新模型仓库参数
/home/tcx/.openllm/repos/github.com/bentoml/openllm-models/main/bentoml/bentos/qwen2/0.5b-instruct-fp16-fcc6

src/bentofile.yaml 更新如下
conda:
channels: null
dependencies: null
environment_yml: null
pip: null
description: null
docker:
base_image: null
cuda_version: null
distro: debian
dockerfile_template: null
env:
HF_TOKEN: ''
python_version: '3.9'
setup_script: null
system_packages: null
envs:

name: HF_TOKEN
exclude: []
include:
'*.py'
ui/*
ui/chunks/*
ui/css/*
ui/media/*
ui/chunks/pages/*
bentovllm_openai/*.py
chat_templates/chat_templates/*.jinja
chat_templates/generation_configs/*.json
labels:
model_name: /home/tcx/bentoml/models/Qwen/Qwen2-0.5B-Instruct
openllm_alias: 0.5b,0.5b-instruct
platforms: linux
source: https://github.com/bentoml/openllm-models-feed/tree/main/source/vllm-chat
models: []
name: null
python:
extra_index_url: null
find_links: null
index_url: null
lock_packages: true
no_index: null
pack_git_packages: true
packages: null
pip_args: null
requirements_txt: ./requirements.txt
trusted_host: null
wheels: null
service: service:VLLM

bento_constants.py 更新如下

CONSTANT_YAML = '''
engine_config:
dtype: half
max_model_len: 2048
model: /home/tcx/bentoml/models/Qwen/Qwen2-0.5B-Instruct
extra_labels:
model_name: /home/tcx/bentoml/models/Qwen/Qwen2-0.5B-Instruct
openllm_alias: 0.5b,0.5b-instruct
project: vllm-chat
service_config:
name: qwen2
resources:
gpu: 1
gpu_type: nvidia-rtx-3060
traffic:
timeout: 300

'''

bento.yaml 更新如下

service: service:VLLM
name: qwen2
version: 0.5b-instruct-fp16-fcc6
bentoml_version: 1.2.20
creation_time: '2024-07-12T14:16:26.873508+00:00'
labels:
model_name: /home/tcx/bentoml/models/Qwen/Qwen2-0.5B-Instruct
openllm_alias: 0.5b,0.5b-instruct
platforms: linux
source: https://github.com/bentoml/openllm-models-feed/tree/main/source/vllm-chat
models: []
runners: []
entry_service: qwen2
services:

name: qwen2
service: ''
models: []
dependencies: []
config:
name: qwen2
resources:
gpu: 1
gpu_type: nvidia-rtx-3060
traffic:
timeout: 300
envs:
name: HF_TOKEN
schema:
name: qwen2
type: service
routes:
- name: chat
  route: /api/chat
  batchable: false
  input:
  properties:
  messages:
  default:
  - role: user
  content: what is the meaning of life?
  items:
  properties:
  role:
  enum:
  - system
  - user
  - assistant
  title: Role
  type: string
  content:
  title: Content
  type: string
  required:
  - role
  - content
  title: Message
  type: object
  title: Messages
  type: array
  model:
  default: /home/tcx/bentoml/models/Qwen/Qwen2-0.5B-Instruct
  title: Model
  type: string
  max_tokens:
  default: 2048
  maximum: 2048
  minimum: 128
  title: Max Tokens
  type: integer
  stop:
  default: null
  title: Stop
  items:
  type: string
  type: array
  title: Input
  type: object
  output:
  title: strIODescriptor
  type: string
  is_stream: true
  media_type: text/event-stream
- name: generate
  route: /api/generate
  batchable: false
  input:
  properties:
  prompt:
  default: Explain superconductors like I'm five years old
  title: Prompt
  type: string
  model:
  default: /home/tcx/bentoml/models/Qwen/Qwen2-0.5B-Instruct
  title: Model
  type: string
  max_tokens:
  default: 2048
  maximum: 2048
  minimum: 128
  title: Max Tokens
  type: integer
  stop:
  default: null
  title: Stop
  items:
  type: string
  type: array
  title: Input
  type: object
  output:
  title: strIODescriptor
  type: string
  is_stream: true
  media_type: text/event-stream
  apis: []
  docker:
  distro: debian
  python_version: '3.9'
  cuda_version: null
  env:
  HF_TOKEN: ''
  system_packages: null
  setup_script: null
  base_image: null
  dockerfile_template: null
  python:
  requirements_txt: ./requirements.txt
  packages: null
  lock_packages: true
  pack_git_packages: true
  index_url: null
  no_index: null
  trusted_host: null
  find_links: null
  extra_index_url: null
  pip_args: null
  wheels: null
  conda:
  environment_yml: null
  channels: null
  dependencies: null
  pip: null

9.启动venv虚拟环境，运行命令
进入/home/tcx/.openllm/repos/github.com/bentoml/openllm-models/main/bentoml/bentos/qwen2/0.5b-instruct-fp16-fcc6/src目录执行命令

$ export BENTOML_HOME=/home/tcx/.openllm/repos/github.com/bentoml/openllm-models/main/bentoml
$ source /home/tcx/.openllm/venv/998690274545817638/bin/activate
$ bentoml serve qwen2:0.5b-instruct-fp16-fcc6

或者
bentoml serve .

10 如果端口被占用，执行如下命令
netstat -tulnp | grep 3000
sudo kill -9 进程号

dsp6414 · 2024-07-21T11:25:45Z

bojiang · 2024-07-22T09:30:18Z

It seems that you have a solution step by step. Anything we can help?

dsp6414 · 2024-07-22T09:35:18Z

It seems that you have a solution step by step. Anything we can help?

I still do not know how to load a Lora fine-tuning model or where to modify the yaml file.

aarnphm · 2024-07-22T10:57:43Z

I don't think have loading lora supported yet, but we can add this @bojiang

bojiang · 2024-07-24T02:45:02Z

As for local path model, I think we can support it.

dsp6414 · 2024-07-24T04:38:12Z

thanks🌺

dsp6414 · 2024-07-30T14:22:14Z

openllm-models service.py

vllm_api_server.openai_serving_chat = OpenAIServingChat(
engine=self.engine,
served_model_names=[ENGINE_CONFIG["model"]],
response_role="assistant",
chat_template=chat_template,
model_config=model_config,
lora_modules=None,
prompt_adapters=None,
request_logger=None,
)
vllm_api_server.openai_serving_completion = OpenAIServingCompletion(
engine=self.engine,
served_model_names=[ENGINE_CONFIG["model"]],
model_config=model_config,
lora_modules=None,
prompt_adapters=None,
request_logger=None,
)

both set lora_modules=None,
how to set my lora model?

dsp6414 · 2024-08-01T12:52:53Z

🌼

dsp6414 · 2024-08-02T11:31:11Z

https://zhuanlan.zhihu.com/p/711869222

aarnphm · 2024-09-24T22:48:57Z

Hi there, I think this involves a larger design discussion that we are currently working internally.

Will update more once we have more details. Thanks for your patience

dsp6414 · 2024-09-24T22:49:29Z

已收到！

dsp6414 closed this as completed Jul 19, 2024

dsp6414 reopened this Jul 21, 2024

dsp6414 closed this as completed Aug 1, 2024

bojiang reopened this Aug 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can openllm support local path model? #1044

Can openllm support local path model? #1044

dsp6414 commented Jul 18, 2024

dsp6414 commented Jul 21, 2024 •

edited

Loading

dsp6414 commented Jul 21, 2024

bojiang commented Jul 22, 2024

dsp6414 commented Jul 22, 2024

aarnphm commented Jul 22, 2024

bojiang commented Jul 24, 2024

dsp6414 commented Jul 24, 2024

dsp6414 commented Jul 30, 2024

dsp6414 commented Aug 1, 2024

dsp6414 commented Aug 2, 2024

aarnphm commented Sep 24, 2024

dsp6414 commented Sep 24, 2024 via email

Can openllm support local path model? #1044

Can openllm support local path model? #1044

Comments

dsp6414 commented Jul 18, 2024

dsp6414 commented Jul 21, 2024 • edited Loading

dsp6414 commented Jul 21, 2024

bojiang commented Jul 22, 2024

dsp6414 commented Jul 22, 2024

aarnphm commented Jul 22, 2024

bojiang commented Jul 24, 2024

dsp6414 commented Jul 24, 2024

dsp6414 commented Jul 30, 2024

dsp6414 commented Aug 1, 2024

dsp6414 commented Aug 2, 2024

aarnphm commented Sep 24, 2024

dsp6414 commented Sep 24, 2024 via email

dsp6414 commented Jul 21, 2024 •

edited

Loading