Skip to content

Commit

Permalink
Fix CPU Offloading Usage & Typos (huggingface#8230)
Browse files Browse the repository at this point in the history
* Fix typos

* Fix `pipe.enable_model_cpu_offload()` usage

* Fix cpu offloading

* Update numbers
  • Loading branch information
tolgacangoz authored May 24, 2024
1 parent db33af0 commit 0ab63ff
Show file tree
Hide file tree
Showing 11 changed files with 56 additions and 60 deletions.
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -77,7 +77,7 @@ Please refer to the [How to use Stable Diffusion in Apple Silicon](https://huggi

## Quickstart

Generating outputs is super easy with 🤗 Diffusers. To generate an image from text, use the `from_pretrained` method to load any pretrained diffusion model (browse the [Hub](https://huggingface.co/models?library=diffusers&sort=downloads) for 22000+ checkpoints):
Generating outputs is super easy with 🤗 Diffusers. To generate an image from text, use the `from_pretrained` method to load any pretrained diffusion model (browse the [Hub](https://huggingface.co/models?library=diffusers&sort=downloads) for 25.000+ checkpoints):

```python
from diffusers import DiffusionPipeline
Expand Down Expand Up @@ -219,7 +219,7 @@ Also, say 👋 in our public Discord channel <a href="https://discord.gg/G7tWnz9
- https://github.com/deep-floyd/IF
- https://github.com/bentoml/BentoML
- https://github.com/bmaltais/kohya_ss
- +9000 other amazing GitHub repositories 💪
- +11.000 other amazing GitHub repositories 💪

Thank you for using us ❤️.

Expand Down
24 changes: 12 additions & 12 deletions docs/source/en/optimization/tgate.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ Before you begin, make sure you install T-GATE.

```bash
pip install tgate
pip install -U pytorch diffusers transformers accelerate DeepCache
pip install -U torch diffusers transformers accelerate DeepCache
```


Expand Down Expand Up @@ -46,12 +46,12 @@ pipe = TgatePixArtLoader(

image = pipe.tgate(
"An alpaca made of colorful building blocks, cyberpunk.",
gate_step=gate_step,
gate_step=gate_step,
num_inference_steps=inference_step,
).images[0]
```
</hfoption>
<hfoption id="Stable Diffusion XL">
<hfoption id="Stable Diffusion XL">

Accelerate `StableDiffusionXLPipeline` with T-GATE:

Expand All @@ -78,9 +78,9 @@ pipe = TgateSDXLLoader(
).to("cuda")

image = pipe.tgate(
"Astronaut in a jungle, cold color palette, muted colors, detailed, 8k.",
gate_step=gate_step,
num_inference_steps=inference_step
"Astronaut in a jungle, cold color palette, muted colors, detailed, 8k.",
gate_step=gate_step,
num_inference_steps=inference_step
).images[0]
```
</hfoption>
Expand Down Expand Up @@ -111,9 +111,9 @@ pipe = TgateSDXLDeepCacheLoader(
).to("cuda")

image = pipe.tgate(
"Astronaut in a jungle, cold color palette, muted colors, detailed, 8k.",
gate_step=gate_step,
num_inference_steps=inference_step
"Astronaut in a jungle, cold color palette, muted colors, detailed, 8k.",
gate_step=gate_step,
num_inference_steps=inference_step
).images[0]
```
</hfoption>
Expand Down Expand Up @@ -151,9 +151,9 @@ pipe = TgateSDXLLoader(
).to("cuda")

image = pipe.tgate(
"Astronaut in a jungle, cold color palette, muted colors, detailed, 8k.",
gate_step=gate_step,
num_inference_steps=inference_step
"Astronaut in a jungle, cold color palette, muted colors, detailed, 8k.",
gate_step=gate_step,
num_inference_steps=inference_step
).images[0]
```
</hfoption>
Expand Down
50 changes: 25 additions & 25 deletions docs/source/en/using-diffusers/inference_with_tcd_lora.md
Original file line number Diff line number Diff line change
Expand Up @@ -78,7 +78,7 @@ image = pipe(
prompt=prompt,
num_inference_steps=4,
guidance_scale=0,
eta=0.3,
eta=0.3,
generator=torch.Generator(device=device).manual_seed(0),
).images[0]
```
Expand Down Expand Up @@ -156,22 +156,22 @@ image = pipe(
prompt=prompt,
num_inference_steps=8,
guidance_scale=0,
eta=0.3,
eta=0.3,
generator=torch.Generator(device=device).manual_seed(0),
).images[0]
```

![](https://github.com/jabir-zheng/TCD/raw/main/assets/animagine_xl.png)

TCD-LoRA also supports other LoRAs trained on different styles. For example, let's load the [TheLastBen/Papercut_SDXL](https://huggingface.co/TheLastBen/Papercut_SDXL) LoRA and fuse it with the TCD-LoRA with the [`~loaders.UNet2DConditionLoadersMixin.set_adapters`] method.
TCD-LoRA also supports other LoRAs trained on different styles. For example, let's load the [TheLastBen/Papercut_SDXL](https://huggingface.co/TheLastBen/Papercut_SDXL) LoRA and fuse it with the TCD-LoRA with the [`~loaders.UNet2DConditionLoadersMixin.set_adapters`] method.

> [!TIP]
> Check out the [Merge LoRAs](merge_loras) guide to learn more about efficient merging methods.
```python
import torch
from diffusers import StableDiffusionXLPipeline
from scheduling_tcd import TCDScheduler
from scheduling_tcd import TCDScheduler

device = "cuda"
base_model_id = "stabilityai/stable-diffusion-xl-base-1.0"
Expand All @@ -191,7 +191,7 @@ image = pipe(
prompt=prompt,
num_inference_steps=4,
guidance_scale=0,
eta=0.3,
eta=0.3,
generator=torch.Generator(device=device).manual_seed(0),
).images[0]
```
Expand All @@ -215,7 +215,7 @@ from PIL import Image
from transformers import DPTFeatureExtractor, DPTForDepthEstimation
from diffusers import ControlNetModel, StableDiffusionXLControlNetPipeline
from diffusers.utils import load_image, make_image_grid
from scheduling_tcd import TCDScheduler
from scheduling_tcd import TCDScheduler

device = "cuda"
depth_estimator = DPTForDepthEstimation.from_pretrained("Intel/dpt-hybrid-midas").to(device)
Expand Down Expand Up @@ -249,13 +249,13 @@ controlnet = ControlNetModel.from_pretrained(
controlnet_id,
torch_dtype=torch.float16,
variant="fp16",
).to(device)
)
pipe = StableDiffusionXLControlNetPipeline.from_pretrained(
base_model_id,
controlnet=controlnet,
torch_dtype=torch.float16,
variant="fp16",
).to(device)
)
pipe.enable_model_cpu_offload()

pipe.scheduler = TCDScheduler.from_config(pipe.scheduler.config)
Expand All @@ -271,9 +271,9 @@ depth_image = get_depth_map(image)
controlnet_conditioning_scale = 0.5 # recommended for good generalization

image = pipe(
prompt,
image=depth_image,
num_inference_steps=4,
prompt,
image=depth_image,
num_inference_steps=4,
guidance_scale=0,
eta=0.3,
controlnet_conditioning_scale=controlnet_conditioning_scale,
Expand All @@ -290,7 +290,7 @@ grid_image = make_image_grid([depth_image, image], rows=1, cols=2)
import torch
from diffusers import ControlNetModel, StableDiffusionXLControlNetPipeline
from diffusers.utils import load_image, make_image_grid
from scheduling_tcd import TCDScheduler
from scheduling_tcd import TCDScheduler

device = "cuda"
base_model_id = "stabilityai/stable-diffusion-xl-base-1.0"
Expand All @@ -301,13 +301,13 @@ controlnet = ControlNetModel.from_pretrained(
controlnet_id,
torch_dtype=torch.float16,
variant="fp16",
).to(device)
)
pipe = StableDiffusionXLControlNetPipeline.from_pretrained(
base_model_id,
controlnet=controlnet,
torch_dtype=torch.float16,
variant="fp16",
).to(device)
)
pipe.enable_model_cpu_offload()

pipe.scheduler = TCDScheduler.from_config(pipe.scheduler.config)
Expand All @@ -322,9 +322,9 @@ canny_image = load_image("https://huggingface.co/datasets/hf-internal-testing/di
controlnet_conditioning_scale = 0.5 # recommended for good generalization

image = pipe(
prompt,
image=canny_image,
num_inference_steps=4,
prompt,
image=canny_image,
num_inference_steps=4,
guidance_scale=0,
eta=0.3,
controlnet_conditioning_scale=controlnet_conditioning_scale,
Expand All @@ -336,7 +336,7 @@ grid_image = make_image_grid([canny_image, image], rows=1, cols=2)
![](https://github.com/jabir-zheng/TCD/raw/main/assets/controlnet_canny_tcd.png)

<Tip>
The inference parameters in this example might not work for all examples, so we recommend you to try different values for `num_inference_steps`, `guidance_scale`, `controlnet_conditioning_scale` and `cross_attention_kwargs` parameters and choose the best one.
The inference parameters in this example might not work for all examples, so we recommend you to try different values for `num_inference_steps`, `guidance_scale`, `controlnet_conditioning_scale` and `cross_attention_kwargs` parameters and choose the best one.
</Tip>

</hfoption>
Expand All @@ -350,7 +350,7 @@ from diffusers import StableDiffusionXLPipeline
from diffusers.utils import load_image, make_image_grid

from ip_adapter import IPAdapterXL
from scheduling_tcd import TCDScheduler
from scheduling_tcd import TCDScheduler

device = "cuda"
base_model_path = "stabilityai/stable-diffusion-xl-base-1.0"
Expand All @@ -359,8 +359,8 @@ ip_ckpt = "sdxl_models/ip-adapter_sdxl.bin"
tcd_lora_id = "h1t/TCD-SDXL-LoRA"

pipe = StableDiffusionXLPipeline.from_pretrained(
base_model_path,
torch_dtype=torch.float16,
base_model_path,
torch_dtype=torch.float16,
variant="fp16"
)
pipe.scheduler = TCDScheduler.from_config(pipe.scheduler.config)
Expand All @@ -375,13 +375,13 @@ ref_image = load_image("https://raw.githubusercontent.com/tencent-ailab/IP-Adapt
prompt = "best quality, high quality, wearing sunglasses"

image = ip_model.generate(
pil_image=ref_image,
pil_image=ref_image,
prompt=prompt,
scale=0.5,
num_samples=1,
num_inference_steps=4,
num_samples=1,
num_inference_steps=4,
guidance_scale=0,
eta=0.3,
eta=0.3,
seed=0,
)[0]

Expand Down
8 changes: 4 additions & 4 deletions docs/source/en/using-diffusers/inpaint.md
Original file line number Diff line number Diff line change
Expand Up @@ -230,7 +230,7 @@ from diffusers.utils import load_image, make_image_grid

pipeline = AutoPipelineForInpainting.from_pretrained(
"runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16, variant="fp16"
).to("cuda")
)
pipeline.enable_model_cpu_offload()
# remove following line if xFormers is not installed or you have PyTorch 2.0 or higher installed
pipeline.enable_xformers_memory_efficient_attention()
Expand All @@ -255,7 +255,7 @@ from diffusers.utils import load_image, make_image_grid

pipeline = AutoPipelineForInpainting.from_pretrained(
"runwayml/stable-diffusion-inpainting", torch_dtype=torch.float16, variant="fp16"
).to("cuda")
)
pipeline.enable_model_cpu_offload()
# remove following line if xFormers is not installed or you have PyTorch 2.0 or higher installed
pipeline.enable_xformers_memory_efficient_attention()
Expand Down Expand Up @@ -296,7 +296,7 @@ from diffusers.utils import load_image, make_image_grid

pipeline = AutoPipelineForInpainting.from_pretrained(
"runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16, variant="fp16"
).to("cuda")
)
pipeline.enable_model_cpu_offload()
# remove following line if xFormers is not installed or you have PyTorch 2.0 or higher installed
pipeline.enable_xformers_memory_efficient_attention()
Expand All @@ -319,7 +319,7 @@ from diffusers.utils import load_image, make_image_grid

pipeline = AutoPipelineForInpainting.from_pretrained(
"runwayml/stable-diffusion-inpainting", torch_dtype=torch.float16, variant="fp16"
).to("cuda")
)
pipeline.enable_model_cpu_offload()
# remove following line if xFormers is not installed or you have PyTorch 2.0 or higher installed
pipeline.enable_xformers_memory_efficient_attention()
Expand Down
20 changes: 10 additions & 10 deletions examples/community/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -240,12 +240,12 @@ pipeline_output = pipe(
# denoising_steps=10, # (optional) Number of denoising steps of each inference pass. Default: 10.
# ensemble_size=10, # (optional) Number of inference passes in the ensemble. Default: 10.
# ------------------------------------------------

# ----- recommended setting for LCM version ------
# denoising_steps=4,
# ensemble_size=5,
# -------------------------------------------------

# processing_res=768, # (optional) Maximum resolution of processing. If set to 0: will not resize at all. Defaults to 768.
# match_input_res=True, # (optional) Resize depth prediction to match input resolution.
# batch_size=0, # (optional) Inference batch size, no bigger than `num_ensemble`. If set to 0, the script will automatically decide the proper batch size. Defaults to 0.
Expand Down Expand Up @@ -1032,7 +1032,7 @@ image = pipe().images[0]

Make sure you have @crowsonkb's <https://github.com/crowsonkb/k-diffusion> installed:

```
```sh
pip install k-diffusion
```

Expand Down Expand Up @@ -1854,13 +1854,13 @@ To use this pipeline, you need to:

You can simply use pip to install IPEX with the latest version.

```python
```sh
python -m pip install intel_extension_for_pytorch
```

**Note:** To install a specific version, run with the following command:

```
```sh
python -m pip install intel_extension_for_pytorch==<version_name> -f https://developer.intel.com/ipex-whl-stable-cpu
```

Expand Down Expand Up @@ -1958,13 +1958,13 @@ To use this pipeline, you need to:

You can simply use pip to install IPEX with the latest version.

```python
```sh
python -m pip install intel_extension_for_pytorch
```

**Note:** To install a specific version, run with the following command:

```
```sh
python -m pip install intel_extension_for_pytorch==<version_name> -f https://developer.intel.com/ipex-whl-stable-cpu
```

Expand Down Expand Up @@ -3010,8 +3010,8 @@ This code implements a pipeline for the Stable Diffusion model, enabling the div

### Sample Code

```
from from examples.community.regional_prompting_stable_diffusion import RegionalPromptingStableDiffusionPipeline
```py
from examples.community.regional_prompting_stable_diffusion import RegionalPromptingStableDiffusionPipeline
pipe = RegionalPromptingStableDiffusionPipeline.from_single_file(model_path, vae=vae)

rp_args = {
Expand Down Expand Up @@ -4131,7 +4131,7 @@ This implementation is based on [Diffusers](https://huggingface.co/docs/diffuser

## Example Usage

```
```py
import os
import torch

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -896,7 +896,6 @@ def collate_fn(examples):
images = []
if args.validation_prompts is not None:
logger.info("Running inference for collecting generated images...")
pipeline = pipeline.to(accelerator.device)
pipeline.torch_dtype = weight_dtype
pipeline.set_progress_bar_config(disable=True)
pipeline.enable_model_cpu_offload()
Expand Down
2 changes: 1 addition & 1 deletion tests/lora/test_lora_layers_sd.py
Original file line number Diff line number Diff line change
Expand Up @@ -642,7 +642,7 @@ def test_sd_load_civitai_empty_network_alpha(self):
This test simply checks that loading a LoRA with an empty network alpha works fine
See: https://github.com/huggingface/diffusers/issues/5606
"""
pipeline = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5").to(torch_device)
pipeline = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5")
pipeline.enable_sequential_cpu_offload()
civitai_path = hf_hub_download("ybelkada/test-ahi-civitai", "ahi_lora_weights.safetensors")
pipeline.load_lora_weights(civitai_path, adapter_name="ahri")
Expand Down
1 change: 0 additions & 1 deletion tests/pipelines/i2vgen_xl/test_i2vgenxl.py
Original file line number Diff line number Diff line change
Expand Up @@ -243,7 +243,6 @@ def tearDown(self):

def test_i2vgen_xl(self):
pipe = I2VGenXLPipeline.from_pretrained("ali-vilab/i2vgen-xl", torch_dtype=torch.float16, variant="fp16")
pipe = pipe.to(torch_device)
pipe.enable_model_cpu_offload()
pipe.set_progress_bar_config(disable=None)
image = load_image(
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -612,10 +612,10 @@ def test_ip_adapter_multiple_masks(self):
def test_instant_style_multiple_masks(self):
image_encoder = CLIPVisionModelWithProjection.from_pretrained(
"h94/IP-Adapter", subfolder="models/image_encoder", torch_dtype=torch.float16
).to("cuda")
)
pipeline = StableDiffusionXLPipeline.from_pretrained(
"RunDiffusion/Juggernaut-XL-v9", torch_dtype=torch.float16, image_encoder=image_encoder, variant="fp16"
).to("cuda")
)
pipeline.enable_model_cpu_offload()

pipeline.load_ip_adapter(
Expand Down
Loading

0 comments on commit 0ab63ff

Please sign in to comment.