-
Notifications
You must be signed in to change notification settings - Fork 629
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question about multi-adapter #656
Comments
Currently, we yet to support unloading lora layers. This has to do with unloading models of the memory are pretty slow from what I have tested so far, when loading around 10-15 layers Another approach is not to disable lora layers when loading model into memory, and load dynamically on request. Imagine in a distributed environment, there is no way to ensure that all model pods will load the adapter correctly. I think for multi adapters, the ability to use the base model can be supported, but I think it is probably very low priority right now. |
I met a question when using multi-adapter. It works with loading different PEFT adapter and call it by the adapter_name/ adapter_id. However, can i call the Vanilla llm? For example, I deploy Llama2 with multi-adapters, can i disable adapters and using the original llam2 model to inference by the framework? Looking forward to u asap.
The text was updated successfully, but these errors were encountered: