diff --git a/.github/workflows/pr_test_fetcher.yml b/.github/workflows/pr_test_fetcher.yml index d33bca1903f4..7eb208505e75 100644 --- a/.github/workflows/pr_test_fetcher.yml +++ b/.github/workflows/pr_test_fetcher.yml @@ -1,4 +1,4 @@ -name: Fast tests for PRs +name: Fast tests for PRs - Test Fetcher on: pull_request: @@ -14,6 +14,10 @@ env: MKL_NUM_THREADS: 4 PYTEST_TIMEOUT: 60 +concurrency: + group: ${{ github.workflow }}-${{ github.head_ref || github.run_id }} + cancel-in-progress: true + jobs: setup_pr_tests: name: Setup PR Tests diff --git a/.github/workflows/push_tests_fast.yml b/.github/workflows/push_tests_fast.yml index acd59ef80dc7..798fa777c6c6 100644 --- a/.github/workflows/push_tests_fast.yml +++ b/.github/workflows/push_tests_fast.yml @@ -5,6 +5,10 @@ on: branches: - main +concurrency: + group: ${{ github.workflow }}-${{ github.head_ref || github.run_id }} + cancel-in-progress: true + env: DIFFUSERS_IS_CI: yes HF_HOME: /mnt/cache diff --git a/.github/workflows/push_tests_mps.yml b/.github/workflows/push_tests_mps.yml index c92aa6426d55..bdea0b760b26 100644 --- a/.github/workflows/push_tests_mps.yml +++ b/.github/workflows/push_tests_mps.yml @@ -13,6 +13,10 @@ env: PYTEST_TIMEOUT: 600 RUN_SLOW: no +concurrency: + group: ${{ github.workflow }}-${{ github.head_ref || github.run_id }} + cancel-in-progress: true + jobs: run_fast_tests_apple_m1: name: Fast PyTorch MPS tests on MacOS diff --git a/docs/source/en/_toctree.yml b/docs/source/en/_toctree.yml index d2583121418e..e855ea36e8cf 100644 --- a/docs/source/en/_toctree.yml +++ b/docs/source/en/_toctree.yml @@ -278,6 +278,8 @@ title: Kandinsky 2.1 - local: api/pipelines/kandinsky_v22 title: Kandinsky 2.2 + - local: api/pipelines/kandinsky3 + title: Kandinsky 3 - local: api/pipelines/latent_consistency_models title: Latent Consistency Models - local: api/pipelines/latent_diffusion diff --git a/docs/source/en/api/pipelines/kandinsky3.md b/docs/source/en/api/pipelines/kandinsky3.md new file mode 100644 index 000000000000..cc4f87d47f58 --- /dev/null +++ b/docs/source/en/api/pipelines/kandinsky3.md @@ -0,0 +1,24 @@ + + +# Kandinsky 3 + +TODO + +## Kandinsky3Pipeline + +[[autodoc]] Kandinsky3Pipeline + - all + - __call__ + +## Kandinsky3Img2ImgPipeline + +[[autodoc]] Kandinsky3Img2ImgPipeline + - all + - __call__ diff --git a/docs/source/en/api/pipelines/pixart.md b/docs/source/en/api/pipelines/pixart.md index 6fa44cd508e4..7d8ff2b36bf2 100644 --- a/docs/source/en/api/pipelines/pixart.md +++ b/docs/source/en/api/pipelines/pixart.md @@ -35,6 +35,112 @@ Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers) +## Inference with under 8GB GPU VRAM + +Run the [`PixArtAlphaPipeline`] with under 8GB GPU VRAM by loading the text encoder in 8-bit precision. Let's walk through a full-fledged example. + +First, install the [bitsandbytes](https://github.com/TimDettmers/bitsandbytes) library: + +```bash +pip install -U bitsandbytes +``` + +Then load the text encoder in 8-bit: + +```python +from transformers import T5EncoderModel +from diffusers import PixArtAlphaPipeline +import torch + +text_encoder = T5EncoderModel.from_pretrained( + "PixArt-alpha/PixArt-XL-2-1024-MS", + subfolder="text_encoder", + load_in_8bit=True, + device_map="auto", + +) +pipe = PixArtAlphaPipeline.from_pretrained( + "PixArt-alpha/PixArt-XL-2-1024-MS", + text_encoder=text_encoder, + transformer=None, + device_map="auto" +) +``` + +Now, use the `pipe` to encode a prompt: + +```python +with torch.no_grad(): + prompt = "cute cat" + prompt_embeds, prompt_attention_mask, negative_embeds, negative_prompt_attention_mask = pipe.encode_prompt(prompt) +``` + +Since text embeddings have been computed, remove the `text_encoder` and `pipe` from the memory, and free up som GPU VRAM: + +```python +import gc + +def flush(): + gc.collect() + torch.cuda.empty_cache() + +del text_encoder +del pipe +flush() +``` + +Then compute the latents with the prompt embeddings as inputs: + +```python +pipe = PixArtAlphaPipeline.from_pretrained( + "PixArt-alpha/PixArt-XL-2-1024-MS", + text_encoder=None, + torch_dtype=torch.float16, +).to("cuda") + +latents = pipe( + negative_prompt=None, + prompt_embeds=prompt_embeds, + negative_prompt_embeds=negative_embeds, + prompt_attention_mask=prompt_attention_mask, + negative_prompt_attention_mask=negative_prompt_attention_mask, + num_images_per_prompt=1, + output_type="latent", +).images + +del pipe.transformer +flush() +``` + + + +Notice that while initializing `pipe`, you're setting `text_encoder` to `None` so that it's not loaded. + + + +Once the latents are computed, pass it off to the VAE to decode into a real image: + +```python +with torch.no_grad(): + image = pipe.vae.decode(latents / pipe.vae.config.scaling_factor, return_dict=False)[0] +image = pipe.image_processor.postprocess(image, output_type="pil")[0] +image.save("cat.png") +``` + +By deleting components you aren't using and flushing the GPU VRAM, you should be able to run [`PixArtAlphaPipeline`] with under 8GB GPU VRAM. + +![](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/pixart/8bits_cat.png) + +If you want a report of your memory-usage, run this [script](https://gist.github.com/sayakpaul/3ae0f847001d342af27018a96f467e4e). + + + +Text embeddings computed in 8-bit can impact the quality of the generated images because of the information loss in the representation space caused by the reduced precision. It's recommended to compare the outputs with and without 8-bit. + + + +While loading the `text_encoder`, you set `load_in_8bit` to `True`. You could also specify `load_in_4bit` to bring your memory requirements down even further to under 7GB. + ## PixArtAlphaPipeline [[autodoc]] PixArtAlphaPipeline diff --git a/docs/source/en/api/schedulers/score_sde_vp.md b/docs/source/en/api/schedulers/score_sde_vp.md index 204cba877722..85da7e8ed539 100644 --- a/docs/source/en/api/schedulers/score_sde_vp.md +++ b/docs/source/en/api/schedulers/score_sde_vp.md @@ -25,4 +25,4 @@ The abstract from the paper is: ## ScoreSdeVpScheduler -[[autodoc]] schedulers.scheduling_sde_vp.ScoreSdeVpScheduler +[[autodoc]] schedulers.deprecated.scheduling_sde_vp.ScoreSdeVpScheduler diff --git a/docs/source/en/api/schedulers/stochastic_karras_ve.md b/docs/source/en/api/schedulers/stochastic_karras_ve.md index eb954d7e5e7b..1bfe4e52e514 100644 --- a/docs/source/en/api/schedulers/stochastic_karras_ve.md +++ b/docs/source/en/api/schedulers/stochastic_karras_ve.md @@ -18,4 +18,4 @@ specific language governing permissions and limitations under the License. [[autodoc]] KarrasVeScheduler ## KarrasVeOutput -[[autodoc]] schedulers.scheduling_karras_ve.KarrasVeOutput +[[autodoc]] schedulers.deprecated.scheduling_karras_ve.KarrasVeOutput \ No newline at end of file diff --git a/docs/source/en/using-diffusers/unconditional_image_generation.md b/docs/source/en/using-diffusers/unconditional_image_generation.md index 1983f6981e8f..6c55c4edec08 100644 --- a/docs/source/en/using-diffusers/unconditional_image_generation.md +++ b/docs/source/en/using-diffusers/unconditional_image_generation.md @@ -14,54 +14,41 @@ specific language governing permissions and limitations under the License. [[open-in-colab]] -Unconditional image generation is a relatively straightforward task. The model only generates images - without any additional context like text or an image - resembling the training data it was trained on. +Unconditional image generation generates images that look like a random sample from the training data the model was trained on because the denoising process is not guided by any additional context like text or image. -The [`DiffusionPipeline`] is the easiest way to use a pre-trained diffusion system for inference. +To get started, use the [`DiffusionPipeline`] to load the [anton-l/ddpm-butterflies-128](https://huggingface.co/anton-l/ddpm-butterflies-128) checkpoint to generate images of butterflies. The [`DiffusionPipeline`] downloads and caches all the model components required to generate an image. -Start by creating an instance of [`DiffusionPipeline`] and specify which pipeline checkpoint you would like to download. -You can use any of the ๐Ÿงจ Diffusers [checkpoints](https://huggingface.co/models?library=diffusers&sort=downloads) from the Hub (the checkpoint you'll use generates images of butterflies). +```py +from diffusers import DiffusionPipeline + +generator = DiffusionPipeline.from_pretrained("anton-l/ddpm-butterflies-128").to("cuda") +image = generator().images[0] +image +``` -๐Ÿ’ก Want to train your own unconditional image generation model? Take a look at the training [guide](../training/unconditional_training) to learn how to generate your own images. +Want to generate images of something else? Take a look at the training [guide](../training/unconditional_training) to learn how to train a model to generate your own images. -In this guide, you'll use [`DiffusionPipeline`] for unconditional image generation with [DDPM](https://arxiv.org/abs/2006.11239): - -```python -from diffusers import DiffusionPipeline - -generator = DiffusionPipeline.from_pretrained("anton-l/ddpm-butterflies-128", use_safetensors=True) -``` +The output image is a [`PIL.Image`](https://pillow.readthedocs.io/en/stable/reference/Image.html?highlight=image#the-image-class) object that can be saved: -The [`DiffusionPipeline`] downloads and caches all modeling, tokenization, and scheduling components. -Because the model consists of roughly 1.4 billion parameters, we strongly recommend running it on a GPU. -You can move the generator object to a GPU, just like you would in PyTorch: - -```python -generator.to("cuda") +```py +image.save("generated_image.png") ``` -Now you can use the `generator` to generate an image: +You can also try experimenting with the `num_inference_steps` parameter, which controls the number of denoising steps. More denoising steps typically produce higher quality images, but it'll take longer to generate. Feel free to play around with this parameter to see how it affects the image quality. -```python -image = generator().images[0] +```py +image = generator(num_inference_steps=100).images[0] image ``` -The output is by default wrapped into a [`PIL.Image`](https://pillow.readthedocs.io/en/stable/reference/Image.html?highlight=image#the-image-class) object. - -You can save the image by calling: - -```python -image.save("generated_image.png") -``` - -Try out the Spaces below, and feel free to play around with the inference steps parameter to see how it affects the image quality! +Try out the Space below to generate an image of a butterfly!