-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add LoRa support to the txt2img
and img2img
pipelines
#119
Conversation
cd1feb4
to
0d03040
Compare
Did some testing:
All images on a 4090, SDXL base model using prompt Interesting inference is a bit slower when using LoRas. I could've used a better trigger words in the prompt, but certainly seems like the LoRa was loaded. One thing which we might want to rethink if the way to pass
|
If the user requests an invalid LoRa repo, it will print the error We can make this more verbose by printing the entire exeption. The runner will continue with inference, but without using the LoRa |
(as a sidenote: i think it would be useful if exceptions like that are collected and passed back. Make a best effort to complete inference and inform the user of any issues it found during the job. Alternatively we could also abort inference) |
I like your LoRa input validation solution because it handles all incorrect values sufficiently. However, we might want to return these errors to the gateway later to inform the user. I think we should return the error from We could also hold off on making that change until we develop the go-livepeer side, @rickstaa any thoughts on that approach? |
VRAM usage looks great, I tested with multiple concurrent requests of different loras. I think this is working good as it is. If this would enhance inference time, I think we should backlog it as a pipeline improvement.
This implementation appears to be working great, I tried a few LoRas from hugging-face and they are downloaded automatically |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR @stronk-dev! This a nice addition to the image pipelines. I tested both text-to-image and image-to-image using the ByteDance/SDXL-Lightning
mdoel with two different loras and multiple images. The pipelines are working great with LoRa support.
See my comments above on the remaining design decisions. I think responding with an informative bad request
response when invalid LoRa values are passed will help inform the user on the gateway side. If you can make that change (or we decide to do it during go-livepeer integration) and resolve conflicts then the PR looks good to me.
6544ef0
to
f6c7e94
Compare
dcaa961
to
f6c7e94
Compare
Just to confirm: we should return a |
b11f621
to
faf8302
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change is ready merge if there's no further optimizations. Please review @stronk-dev and @stronk-dev if you could take a quick look. I've fully tested both text-to-image
and image-to-image
pipelines with various loras
Correct and I'd like to send the specific message back to the gateway. I think this would help developers learn the API faster, but not required for this initial release of LoRa support in my opinion, open to suggestions |
Thanks for all the tweaks, @eliteprox ! LGTM |
a6baf94
to
515d6be
Compare
@rickstaa I've resolved conflicts on |
@stronk-dev Just an update on this PR. There is one remaining task to load the 2-step, 4-step or 8-step checkpoints differently when LoRas are used on non-sdxl models. See https://huggingface.co/ByteDance/SDXL-Lightning#2-step-4-step-8-step-lora This logic is in t2i and i2i: |
515d6be
to
1fdf8b8
Compare
@stronk-dev I agree -> #188. |
@eliteprox good catch. Could you create a seperate linear item for it? |
This commit replaces the lora load function with a class to cleanup the codebase. It also cleansup some error handling and docstrings.
1fdf8b8
to
b54587e
Compare
@stronk-dev thanks again for your contribution. To ensure optimal network performance and leaner code. I applied several improvements:
Output testsI performed several output tests to ensure the loras were being applied and correctly disabled. Prompt: Without loras{
"model_id": "",
"loras": "",
"prompt": "pixel, a cute corgi",
"height": 576,
"width": 1024,
"guidance_scale": 1.5,
"negative_prompt": "3d render, realistic",
"safety_check": true,
"seed": 0,
"num_inference_steps": 50,
"num_images_per_prompt": 1
} With loras
Next request wihout lorasFollowing request with loras againRequest without loras and negative promptRequest with better parameters after lora was enabled{
"model_id": "",
"loras": "",
"prompt": "close-up photo of a beautiful red rose breaking through a cube made of ice , splintered cracked ice surface, frosted colors, blood dripping from rose, melting ice, Valentine’s Day vibes, cinematic, sharp focus, intricate, cinematic, dramatic light",
"height": 576,
"width": 1024,
"guidance_scale": 7.5,
"negative_prompt": "",
"safety_check": true,
"seed": 0,
"num_inference_steps": 50,
"num_images_per_prompt": 1
} Request with better parameters on restart{
"model_id": "",
"loras": "",
"prompt": "close-up photo of a beautiful red rose breaking through a cube made of ice , splintered cracked ice surface, frosted colors, blood dripping from rose, melting ice, Valentine’s Day vibes, cinematic, sharp focus, intricate, cinematic, dramatic light",
"height": 576,
"width": 1024,
"guidance_scale": 7.5,
"negative_prompt": "",
"safety_check": true,
"seed": 0,
"num_inference_steps": 50,
"num_images_per_prompt": 1
} |
This commit applies several performance optimizations that allow the loras to be kept in memory to be used for similar requests.
@eliteprox, @stronk-dev I will add one more optimization where we allow up to 4 loras but allow 8 loras in buffer and I will look at memory crashes. After that it is good to be merged. |
This commit introduces a buffer to keep LoRas in memory up to a certain size. This optimization prevents unnecessary reloads when the orchestrator receives repeated requests, improving network performance. While PyTorch handles memory cleanup when limits are reached, we can call `torch.cuda.empty_cache()` after the `delete_adapters` function call if we encounter frequent out-of-memory errors.
This commit adds some logic which cleans up loras from GPU memory when the free memory on the GPU goes below 2 GB. It also increase the max loras on the GPU to 12.
@eliteprox and @stronk-dev I now also added some logic to cleanup memory when the free memory on the GPU goes below 2 GB. |
This commit cleans up some unused code.
This commit applies the black formatter to the lora related code.
This commit moves the LoraLoadingError closer to the LoraLoader.
Another CheckBasecurl -X POST "http://0.0.0.0:8935/text-to-image" \
-H "Content-Type: application/json" \
-d '{
"model_id":"stabilityai/stable-diffusion-xl-base-1.0",
"loras": "{ \"latent-consistency/lcm-lora-sdxl\": 0.0, \"KappaNeuro/jim-mahfood-style\": 1.0}",
"prompt":"A cool cat on the beach",
"width": 1024,
"height": 1024,
"seed": 818566848
}' Lorascurl -X POST "http://0.0.0.0:8935/text-to-image" \
-H "Content-Type: application/json" \
-d '{
"model_id":"stabilityai/stable-diffusion-xl-base-1.0",
"loras": "{ \"alvdansen/dimension-w-sd15\": 1.0}",
"prompt":"A cool cat on the beach",
"width": 1024,
"height": 1024,
"seed": 818566848
}' Back to base |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested loading/unloading LoRa, max of 4 LoRas and validation of invalid LoRa values.
LGTM!
This commit removes the verbose 400 error since we will handle this in a subsequent pull request.
Adds support to load in arbitrary embeddings, modules (like LCM), etc.
Fullfills livepeer/bounties#33
Still requires:
- Testing if it works- Gracefully deal with non-existing requested LoRasDesign decision: do we want to keep LoRas loaded, or always unload already loaded weights like we do nowDesign decision: use the current method of requesting LoRas, or explore other optionsDesign decision: abort inference if one of theloras
param is invalid or it fails to load one of the LoRas, or continue on like it does nowLoRas can be loaded by passing a new
loras
parameter. In the current design this needs to be a string, parseable as JSON. For example:curl -X POST -H "Content-Type: application/json" localhost:8000/text-to-image -d '{"prompt":"light saber battle in the death star", "loras": "{ \"nerijs/pixel-art-xl\" : 1.2 }"}'