How to Offload Models and Load in Other Ones #1904

jmsalvador2395 · 2023-12-03T10:04:20Z

jmsalvador2395
Dec 3, 2023

I have some example code here

from vllm import LLM, SamplingParams

# set model name and prompt
model_name = 'mistralai/Mistral-7B-Instruct-v0.1'
prompt = 'some prompt'

# load model
llm = LLM(model_name,
          download_dir='/data/shared/llm_cache',
          gpu_memory_utilization=.9,
          tensor_parallel_size=4)

# load sampling params
sp = SamplingParams(max_tokens=512)

# generate response
response = llm.generate(
    prompt,
    sampling_params=sp
)

# print response
print(response[0].outputs[0].text)

I want to be able to load in another model and run inference using the same prompt. Is there a way for me to offload the current model safely?

For context, I've tried calling del llm and del llm.llm_engine along with calling torch.cuda.empty_cache() and saw that it causes issues.
When trying to create another llm object I get the message 2023-12-03 03:37:22,346 INFO worker.py:1507 -- Calling ray.init() again after it has already been called. Then after calling ray.shutdown() and trying to define a new llm object, i get the warning that tokenizers parallelism was disabled.

Any help is appreciated.

roshan-gopalakrishnan · 2024-06-04T06:14:22Z

roshan-gopalakrishnan
Jun 4, 2024

I am also looking for an answer.

0 replies

asimj1342 · 2024-06-14T02:27:42Z

asimj1342
Jun 14, 2024

    del llm
    gc.collect()
    torch.cuda.empty_cache()
    
    try this

4 replies

roshan-gopalakrishnan Sep 23, 2024

What is gc ? how to import that ?

lihuahua123 Sep 23, 2024

import gc

What is gc ? how to import that ?

roshan-gopalakrishnan Sep 23, 2024

This solution worked for me.

roshan-gopalakrishnan Sep 23, 2024

Thanks @lihuahua123, I saw this link #3281 for more information.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to Offload Models and Load in Other Ones #1904

{{title}}

Replies: 2 comments 4 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

How to Offload Models and Load in Other Ones #1904

jmsalvador2395 Dec 3, 2023

Replies: 2 comments · 4 replies

roshan-gopalakrishnan Jun 4, 2024

asimj1342 Jun 14, 2024

roshan-gopalakrishnan Sep 23, 2024

lihuahua123 Sep 23, 2024

roshan-gopalakrishnan Sep 23, 2024

roshan-gopalakrishnan Sep 23, 2024

jmsalvador2395
Dec 3, 2023

Replies: 2 comments 4 replies

roshan-gopalakrishnan
Jun 4, 2024

asimj1342
Jun 14, 2024