How to Offload Models and Load in Other Ones #1904
Unanswered
jmsalvador2395
asked this question in
Q&A
Replies: 2 comments 4 replies
-
I am also looking for an answer. |
Beta Was this translation helpful? Give feedback.
0 replies
-
|
Beta Was this translation helpful? Give feedback.
4 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I have some example code here
I want to be able to load in another model and run inference using the same prompt. Is there a way for me to offload the current model safely?
For context, I've tried calling
del llm
anddel llm.llm_engine
along with callingtorch.cuda.empty_cache()
and saw that it causes issues.When trying to create another llm object I get the message
2023-12-03 03:37:22,346 INFO worker.py:1507 -- Calling ray.init() again after it has already been called
. Then after callingray.shutdown()
and trying to define a new llm object, i get the warning that tokenizers parallelism was disabled.Any help is appreciated.
Beta Was this translation helpful? Give feedback.
All reactions