Add LoRa support to the `txt2img` and `img2img` pipelines #119

stronk-dev · 2024-07-10T10:25:45Z

Adds support to load in arbitrary embeddings, modules (like LCM), etc.

Still requires:
~~- Testing if it works~~
~~- Gracefully deal with non-existing requested LoRas~~
~~Design decision: do we want to keep LoRas loaded, or always unload already loaded weights like we do now~~
~~Design decision: use the current method of requesting LoRas, or explore other options~~
~~Design decision: abort inference if one of the loras param is invalid or it fails to load one of the LoRas, or continue on like it does now~~

LoRas can be loaded by passing a new loras parameter. In the current design this needs to be a string, parseable as JSON. For example: curl -X POST -H "Content-Type: application/json" localhost:8000/text-to-image -d '{"prompt":"light saber battle in the death star", "loras": "{ \"nerijs/pixel-art-xl\" : 1.2 }"}'

stronk-dev · 2024-07-16T13:51:23Z

Did some testing:

2024-07-16 13:38:15,602 INFO:     Application startup complete.
2024-07-16 13:38:15,604 INFO:     Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)
100%|██████████| 50/50 [00:06<00:00,  7.93it/s]
2024-07-16 13:38:23,625 INFO:     172.17.0.1:52384 - "POST /text-to-image HTTP/1.1" 200 OK
100%|██████████| 50/50 [00:08<00:00,  6.24it/s]
2024-07-16 13:39:10,578 INFO:     172.17.0.1:35758 - "POST /text-to-image HTTP/1.1" 200 OK
100%|██████████| 50/50 [00:06<00:00,  8.13it/s]
2024-07-16 13:39:22,707 INFO:     172.17.0.1:34084 - "POST /text-to-image HTTP/1.1" 200 OK
100%|██████████| 50/50 [00:08<00:00,  6.23it/s]
2024-07-16 13:39:36,599 INFO:     172.17.0.1:39376 - "POST /text-to-image HTTP/1.1" 200 OK

All images on a 4090, SDXL base model using prompt light saber battle in the death star. Requests 1 and 3 are without LoRas. Requests 2 and 4 requested the nerijs/pixel-art-xl LoRa.

Interesting inference is a bit slower when using LoRas. I could've used a better trigger words in the prompt, but certainly seems like the LoRa was loaded.

One thing which we might want to rethink if the way to pass loras parameter:

curl -X POST -H "Content-Type: application/json" localhost:8000/text-to-image -d '{"prompt":"light saber battle in the death star", "loras": "{ \"nerijs/pixel-art-xl\" : 1.2 }"}'

stronk-dev · 2024-07-16T14:15:12Z

If the user requests an invalid LoRa repo, it will print the error 2024-07-16 14:14:03,062 - app.pipelines.util - WARNING - Unable to load LoRas for adapter 'nerijs/pixel-ar' (RepositoryNotFoundError)

We can make this more verbose by printing the entire exeption. The runner will continue with inference, but without using the LoRa

stronk-dev · 2024-07-16T14:26:19Z

(as a sidenote: i think it would be useful if exceptions like that are collected and passed back. Make a best effort to complete inference and inform the user of any issues it found during the job. Alternatively we could also abort inference)

eliteprox · 2024-08-07T13:23:26Z

If the user requests an invalid LoRa repo, it will print the error 2024-07-16 14:14:03,062 - app.pipelines.util - WARNING - Unable to load LoRas for adapter 'nerijs/pixel-ar' (RepositoryNotFoundError)

We can make this more verbose by printing the entire exeption. The runner will continue with inference, but without using the LoRa

I like your LoRa input validation solution because it handles all incorrect values sufficiently. However, we might want to return these errors to the gateway later to inform the user. I think we should return the error from load_loras and return a bad request in the runner in case of an invalid lora or weight so go-livepeer can return it. I like the error messages you have now, I don't think more detail is needed on them.

We could also hold off on making that change until we develop the go-livepeer side, @rickstaa any thoughts on that approach?

eliteprox · 2024-08-07T14:26:29Z

Design decision: do we want to keep LoRas loaded, or always unload already loaded weights like we do now

VRAM usage looks great, I tested with multiple concurrent requests of different loras. I think this is working good as it is. If this would enhance inference time, I think we should backlog it as a pipeline improvement.

Design decision: use the current method of requesting LoRas, or explore other options

This implementation appears to be working great, I tried a few LoRas from hugging-face and they are downloaded automatically

eliteprox

Thanks for the PR @stronk-dev! This a nice addition to the image pipelines. I tested both text-to-image and image-to-image using the ByteDance/SDXL-Lightning mdoel with two different loras and multiple images. The pipelines are working great with LoRa support.

See my comments above on the remaining design decisions. I think responding with an informative bad request response when invalid LoRa values are passed will help inform the user on the gateway side. If you can make that change (or we decide to do it during go-livepeer integration) and resolve conflicts then the PR looks good to me.

rickstaa · 2024-08-10T07:34:45Z

Intersting 🤔! Not sure what went on during the OpenAPI spec configuration -> dcaa961. Maybe my generation code is no longer sufficient. Will revert dcaa961 and check on monday.

stronk-dev · 2024-08-14T10:36:31Z

See my comments above on the remaining design decisions. I think responding with an informative bad request response when invalid LoRa values are passed will help inform the user on the gateway side. If you can make that change (or we decide to do it during go-livepeer integration) and resolve conflicts then the PR looks good to me.

Just to confirm: we should return a bad request error and abort inference for any of the Exceptions in load_loras function?

eliteprox

This change is ready merge if there's no further optimizations. Please review @stronk-dev and @stronk-dev if you could take a quick look. I've fully tested both text-to-image and image-to-image pipelines with various loras

eliteprox · 2024-08-23T12:37:13Z

See my comments above on the remaining design decisions. I think responding with an informative bad request response when invalid LoRa values are passed will help inform the user on the gateway side. If you can make that change (or we decide to do it during go-livepeer integration) and resolve conflicts then the PR looks good to me.

Just to confirm: we should return a bad request error and abort inference for any of the Exceptions in load_loras function?

Correct and I'd like to send the specific message back to the gateway. I think this would help developers learn the API faster, but not required for this initial release of LoRa support in my opinion, open to suggestions

stronk-dev · 2024-08-27T15:06:50Z

Thanks for all the tweaks, @eliteprox ! LGTM

eliteprox · 2024-08-27T16:25:10Z

Thanks for all the tweaks, @eliteprox ! LGTM

@rickstaa I've resolved conflicts on runner.gen.go and re-generated the openapi schema. This is ready for merge along with livepeer/go-livepeer#3154

eliteprox · 2024-09-18T17:05:33Z

@stronk-dev Just an update on this PR. There is one remaining task to load the 2-step, 4-step or 8-step checkpoints differently when LoRas are used on non-sdxl models. See https://huggingface.co/ByteDance/SDXL-Lightning#2-step-4-step-8-step-lora

This logic is in t2i and i2i:
https://github.com/livepeer/ai-worker/blob/feature/loras/runner/app/pipelines/text_to_image.py#L86
https://github.com/livepeer/ai-worker/blob/feature/loras/runner/app/pipelines/image_to_image.py#L76

rickstaa · 2024-09-19T13:19:55Z

(as a sidenote: i think it would be useful if exceptions like that are collected and passed back. Make a best effort to complete inference and inform the user of any issues it found during the job. Alternatively we could also abort inference)

@stronk-dev I agree -> #188.

rickstaa · 2024-09-19T13:21:48Z

See my comments above on the remaining design decisions. I think responding with an informative bad request response when invalid LoRa values are passed will help inform the user on the gateway side. If you can make that change (or we decide to do it during go-livepeer integration) and resolve conflicts then the PR looks good to me.

@eliteprox good catch. Could you create a seperate linear item for it?

This commit replaces the lora load function with a class to cleanup the codebase. It also cleansup some error handling and docstrings.

rickstaa · 2024-09-22T09:20:33Z

@stronk-dev thanks again for your contribution. To ensure optimal network performance and leaner code. I applied several improvements:

I migrated the code into its own class.
I added some logic to disable loras instead of unloading them when a request without loras comes through.
I added loaded lora tracking so we only unload loras when needed.

Output tests

I performed several output tests to ensure the loras were being applied and correctly disabled.

Prompt: pixel, a cute corgi.
Negative prompt: 3d render, realistic.
Base: stabilityai/stable-diffusion-xl-base-1.0.
Loras: { \"latent-consistency/lcm-lora-sdxl\": 1.0, \"nerijs/pixel-art-xl\": 1.2}.
Guidance scale: 1.5.
Inference steps : 50.

Without loras

{
  "model_id": "",
  "loras": "",
  "prompt": "pixel, a cute corgi",
  "height": 576,
  "width": 1024,
  "guidance_scale": 1.5,
  "negative_prompt": "3d render, realistic",
  "safety_check": true,
  "seed": 0,
  "num_inference_steps": 50,
  "num_images_per_prompt": 1
}

With loras

{
  "model_id": "",
  "loras": "{ \"latent-consistency/lcm-lora-sdxl\": 1.0, \"nerijs/pixel-art-xl\": 1.2}",
  "prompt": "pixel, a cute corgi",
  "height": 576,
  "width": 1024,
  "guidance_scale": 1.5,
  "negative_prompt": "3d render, realistic",
  "safety_check": true,
  "seed": 0,
  "num_inference_steps": 50,
  "num_images_per_prompt": 1
}

Next request wihout loras

Following request with loras again

Request without loras and negative prompt

Request with better parameters after lora was enabled

{
  "model_id": "",
  "loras": "",
  "prompt": "close-up photo of a beautiful red rose breaking through a cube made of ice , splintered cracked ice surface, frosted colors, blood dripping from rose, melting ice, Valentine’s Day vibes, cinematic, sharp focus, intricate, cinematic, dramatic light",
  "height": 576,
  "width": 1024,
  "guidance_scale": 7.5,
  "negative_prompt": "",
  "safety_check": true,
  "seed": 0,
  "num_inference_steps": 50,
  "num_images_per_prompt": 1
}

Request with better parameters on restart

{
  "model_id": "",
  "loras": "",
  "prompt": "close-up photo of a beautiful red rose breaking through a cube made of ice , splintered cracked ice surface, frosted colors, blood dripping from rose, melting ice, Valentine’s Day vibes, cinematic, sharp focus, intricate, cinematic, dramatic light",
  "height": 576,
  "width": 1024,
  "guidance_scale": 7.5,
  "negative_prompt": "",
  "safety_check": true,
  "seed": 0,
  "num_inference_steps": 50,
  "num_images_per_prompt": 1
}

This commit applies several performance optimizations that allow the loras to be kept in memory to be used for similar requests.

rickstaa · 2024-09-22T09:29:18Z

@eliteprox, @stronk-dev I will add one more optimization where we allow up to 4 loras but allow 8 loras in buffer and I will look at memory crashes. After that it is good to be merged.

This commit introduces a buffer to keep LoRas in memory up to a certain size. This optimization prevents unnecessary reloads when the orchestrator receives repeated requests, improving network performance. While PyTorch handles memory cleanup when limits are reached, we can call `torch.cuda.empty_cache()` after the `delete_adapters` function call if we encounter frequent out-of-memory errors.

This commit adds some logic which cleans up loras from GPU memory when the free memory on the GPU goes below 2 GB. It also increase the max loras on the GPU to 12.

rickstaa · 2024-09-22T15:27:51Z

@eliteprox and @stronk-dev I now also added some logic to cleanup memory when the free memory on the GPU goes below 2 GB.

This commit cleans up some unused code.

This commit applies the black formatter to the lora related code.

This commit moves the LoraLoadingError closer to the LoraLoader.

rickstaa · 2024-09-22T20:43:23Z

Another Check

Base

curl -X POST "http://0.0.0.0:8935/text-to-image" \
    -H "Content-Type: application/json" \
    -d '{
        "model_id":"stabilityai/stable-diffusion-xl-base-1.0",
        "loras": "{ \"latent-consistency/lcm-lora-sdxl\": 0.0, \"KappaNeuro/jim-mahfood-style\": 1.0}",
        "prompt":"A cool cat on the beach",
        "width": 1024,
        "height": 1024,
        "seed": 818566848
    }'

Loras

curl -X POST "http://0.0.0.0:8935/text-to-image" \
    -H "Content-Type: application/json" \
    -d '{
        "model_id":"stabilityai/stable-diffusion-xl-base-1.0",
        "loras": "{ \"alvdansen/dimension-w-sd15\": 1.0}",
        "prompt":"A cool cat on the beach",
        "width": 1024,
        "height": 1024,
        "seed": 818566848
    }'

Back to base

eliteprox

Tested loading/unloading LoRa, max of 4 LoRas and validation of invalid LoRa values.

LGTM!

This commit removes the verbose 400 error since we will handle this in a subsequent pull request.

rickstaa force-pushed the main branch 3 times, most recently from cd1feb4 to 0d03040 Compare July 16, 2024 13:10

stronk-dev marked this pull request as ready for review July 16, 2024 13:51

stronk-dev requested a review from rickstaa as a code owner July 16, 2024 13:51

eliteprox requested changes Aug 7, 2024

View reviewed changes

eliteprox force-pushed the feature/loras branch 3 times, most recently from 6544ef0 to f6c7e94 Compare August 10, 2024 00:57

rickstaa force-pushed the feature/loras branch from dcaa961 to f6c7e94 Compare August 10, 2024 07:35

eliteprox force-pushed the feature/loras branch from b11f621 to faf8302 Compare August 22, 2024 21:33

eliteprox approved these changes Aug 23, 2024

View reviewed changes

eliteprox mentioned this pull request Aug 26, 2024

print error message on gateway for bad request lora errors livepeer/go-livepeer#3154

Merged

5 tasks

eliteprox force-pushed the feature/loras branch from a6baf94 to 515d6be Compare August 27, 2024 16:21

rickstaa force-pushed the feature/loras branch from 515d6be to 1fdf8b8 Compare September 19, 2024 13:12

stronk-dev added 2 commits September 20, 2024 09:39

Add loras parameter to txt2img and img2img routes

d30e7d3

Util: add load_loras function to load LoRas to a given pipeline

05fc8c7

eliteprox and others added 10 commits September 20, 2024 09:39

remove semicolon

3cc00e9

set default value for loras parameter

84bdf1d

add lora logic to image-to-image, fix lora parsing error

a2476cb

define loaded_loras as array in image-to-image

685b9ea

use blank string instead of None for loras parameter

be62d8d

abort inference on lora loading error and return msg to gateway

bfa654e

return error msg on bad request for t2i

abe5cc9

update error message accordingly

a3ce2df

update runner after rebase

983c36f

refactor(runner): replace lora load function with class

b54587e

This commit replaces the lora load function with a class to cleanup the codebase. It also cleansup some error handling and docstrings.

rickstaa force-pushed the feature/loras branch from 1fdf8b8 to b54587e Compare September 20, 2024 14:12

eliteprox mentioned this pull request Sep 20, 2024

add lora docs for t2i and i2i livepeer/docs#657

Merged

refactor(runner): improve lora loading performance

cdf0db8

This commit applies several performance optimizations that allow the loras to be kept in memory to be used for similar requests.

rickstaa added 2 commits September 22, 2024 13:40

feat(runner): add loras memory cleanup mechanism

594a400

This commit adds some logic which cleans up loras from GPU memory when the free memory on the GPU goes below 2 GB. It also increase the max loras on the GPU to 12.

rickstaa approved these changes Sep 22, 2024

View reviewed changes

rickstaa added 3 commits September 22, 2024 17:31

refactor(runner): cleanup unused code

3eb3307

This commit cleans up some unused code.

chore(runner): apply black formatter

1b09cc6

This commit applies the black formatter to the lora related code.

refactor(runner): move LoraLoadingError

6221751

This commit moves the LoraLoadingError closer to the LoraLoader.

eliteprox approved these changes Sep 22, 2024

View reviewed changes

rickstaa added 3 commits September 22, 2024 22:57

refactor(worker): remove verbose 400 error variant

6dd41d6

This commit removes the verbose 400 error since we will handle this in a subsequent pull request.

fixup! refactor(worker): remove verbose 400 error variant

e8c7ab6

fixup! fixup! refactor(worker): remove verbose 400 error variant

15922a3

rickstaa merged commit bcd929d into main Sep 22, 2024
2 of 3 checks passed

rickstaa deleted the feature/loras branch September 22, 2024 21:09

rickstaa mentioned this pull request Nov 30, 2024

Add Lora Support to T2I and I2I pipelines [70 LPT] livepeer/bounties#33

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add LoRa support to the `txt2img` and `img2img` pipelines #119

Add LoRa support to the `txt2img` and `img2img` pipelines #119

stronk-dev commented Jul 10, 2024 •

edited by rickstaa

Loading

stronk-dev commented Jul 16, 2024

stronk-dev commented Jul 16, 2024

stronk-dev commented Jul 16, 2024

eliteprox commented Aug 7, 2024

eliteprox commented Aug 7, 2024 •

edited

Loading

eliteprox left a comment

rickstaa commented Aug 10, 2024

stronk-dev commented Aug 14, 2024

eliteprox left a comment •

edited

Loading

eliteprox commented Aug 23, 2024 •

edited

Loading

stronk-dev commented Aug 27, 2024

eliteprox commented Aug 27, 2024

eliteprox commented Sep 18, 2024

rickstaa commented Sep 19, 2024

rickstaa commented Sep 19, 2024

rickstaa commented Sep 22, 2024

rickstaa commented Sep 22, 2024 •

edited

Loading

rickstaa commented Sep 22, 2024

rickstaa commented Sep 22, 2024

eliteprox left a comment

Add LoRa support to the txt2img and img2img pipelines #119

Add LoRa support to the txt2img and img2img pipelines #119

Conversation

stronk-dev commented Jul 10, 2024 • edited by rickstaa Loading

stronk-dev commented Jul 16, 2024

stronk-dev commented Jul 16, 2024

stronk-dev commented Jul 16, 2024

eliteprox commented Aug 7, 2024

eliteprox commented Aug 7, 2024 • edited Loading

eliteprox left a comment

Choose a reason for hiding this comment

rickstaa commented Aug 10, 2024

stronk-dev commented Aug 14, 2024

eliteprox left a comment • edited Loading

Choose a reason for hiding this comment

eliteprox commented Aug 23, 2024 • edited Loading

stronk-dev commented Aug 27, 2024

eliteprox commented Aug 27, 2024

eliteprox commented Sep 18, 2024

rickstaa commented Sep 19, 2024

rickstaa commented Sep 19, 2024

rickstaa commented Sep 22, 2024

Output tests

Without loras

With loras

Next request wihout loras

Following request with loras again

Request without loras and negative prompt

Request with better parameters after lora was enabled

Request with better parameters on restart

rickstaa commented Sep 22, 2024 • edited Loading

rickstaa commented Sep 22, 2024

rickstaa commented Sep 22, 2024

Another Check

Base

Loras

Back to base

eliteprox left a comment

Choose a reason for hiding this comment

Add LoRa support to the `txt2img` and `img2img` pipelines #119

Add LoRa support to the `txt2img` and `img2img` pipelines #119

stronk-dev commented Jul 10, 2024 •

edited by rickstaa

Loading

eliteprox commented Aug 7, 2024 •

edited

Loading

eliteprox left a comment •

edited

Loading

eliteprox commented Aug 23, 2024 •

edited

Loading

rickstaa commented Sep 22, 2024 •

edited

Loading