Question about multi-adapter #656

KyrieCui · 2023-11-15T04:28:22Z

I met a question when using multi-adapter. It works with loading different PEFT adapter and call it by the adapter_name/ adapter_id. However, can i call the Vanilla llm? For example, I deploy Llama2 with multi-adapters, can i disable adapters and using the original llam2 model to inference by the framework? Looking forward to u asap.

aarnphm · 2023-11-15T05:28:21Z

Currently, we yet to support unloading lora layers. This has to do with unloading models of the memory are pretty slow from what I have tested so far, when loading around 10-15 layers

Another approach is not to disable lora layers when loading model into memory, and load dynamically on request. Imagine in a distributed environment, there is no way to ensure that all model pods will load the adapter correctly.

I think for multi adapters, the ability to use the base model can be supported, but I think it is probably very low priority right now.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about multi-adapter #656

Question about multi-adapter #656

KyrieCui commented Nov 15, 2023

aarnphm commented Nov 15, 2023

Question about multi-adapter #656

Question about multi-adapter #656

Comments

KyrieCui commented Nov 15, 2023

aarnphm commented Nov 15, 2023