Unload model from GPU #362

miqaP · 2024-09-16T20:57:41Z

I'm trying to use different models with LMQL, but it seems that each new model is loaded onto the GPU. Is it possible to unload a model before loading a new one? I've searched through the code but haven't been able to figure out how to unload a model.
Here is the code I use to load a model :

self._llm = lmql.model(
                    f"local:llama.cpp:{model.get_model_absolute_path()}",
                    tokenizer=model.tokenizer,
                    n_gpu_layers=-1,
                    n_ctx=4096,
                )

I found this issue #228 but it refers to loading model using the cli "lmql serve-model"

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unload model from GPU #362

Unload model from GPU #362

miqaP commented Sep 16, 2024

Unload model from GPU #362

Unload model from GPU #362

Comments

miqaP commented Sep 16, 2024