[Feature]: Regional Prompting script fails when Model CPU offload is enabled #3343

brknsoul · 2024-07-17T06:19:31Z

Issue Description

It seems that Regional Prompting requires the entire model to be on a single device, which isn't useful for those with limited vram.
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!

Tested on nvidia, zluda, and directml.
Functions normally while Model CPU offload is disabled.

Version Platform Description

Python version=3.10.11 platform=Windows bin="E:\sdmaster\venv\Scripts\python.exe"
venv="E:\sdmaster\venv"
Version: app=sd.next updated=2024-07-10 hash=2ec6e9ee branch=master
url=https://github.com/vladmandic/automatic/tree/master ui=main
Platform: arch=AMD64 cpu=Intel64 Family 6 Model 151 Stepping 5, GenuineIntel system=Windows
release=Windows-10-10.0.19045-SP0 python=3.10.11

Relevant log output

15:47:09-707881 DEBUG    Pipeline switch: custom=regional_prompting_stable_diffusion
15:47:10-462680 DEBUG    Pipeline switch: from=StableDiffusionPipeline to=RegionalPromptingStableDiffusionPipeline
                         components=['vae', 'text_encoder', 'tokenizer', 'unet', 'scheduler', 'safety_checker',
                         'feature_extractor', 'requires_safety_checker'] skipped=['image_encoder'] missing=[]
15:47:10-463681 DEBUG    Setting model VAE: upcast=False
15:47:10-465684 DEBUG    Setting model: enable VAE tiling
15:47:10-474681 DEBUG    Setting model: enable model CPU offload
15:47:10-492681 DEBUG    Regional: args={'prompt': 'blue sky BREAK\nbrunette hair BREAK\nbook shelf BREAK\nlamp on a
                         desk BREAK\nwomen wearing a red dress and sitting on a sofa', 'rp_args': {'mode': 'rows',
                         'power': 1, 'div': '1,2,1,1;2,4,6'}}
15:47:10-493680 INFO     Applying hypertile: unet=256
15:47:10-601377 INFO     Base: class=RegionalPromptingStableDiffusionPipeline
15:47:10-798746 DEBUG    Sampler: sampler="DPM++ 2M" config={'num_train_timesteps': 1000, 'beta_start': 0.00085,
                         'beta_end': 0.012, 'beta_schedule': 'scaled_linear', 'prediction_type': 'epsilon',
                         'thresholding': False, 'sample_max_value': 1.0, 'algorithm_type': 'sde-dpmsolver++',
                         'solver_type': 'midpoint', 'lower_order_final': False, 'use_karras_sigmas': True,
                         'final_sigmas_type': 'zero', 'timestep_spacing': 'linspace', 'solver_order': 2}
15:47:10-932750 DEBUG    Torch generator: device=cuda seeds=[2376105058]
15:47:10-934751 DEBUG    Diffuser pipeline: RegionalPromptingStableDiffusionPipeline task=DiffusersTaskType.TEXT_2_IMAGE
                         batch=1/1x1 set={'prompt': 120, 'negative_prompt': 1, 'guidance_scale': 6,
                         'num_inference_steps': 20, 'eta': 1.0, 'output_type': 'latent', 'width': 512, 'height': 512,
                         'rp_args': {'mode': 'rows', 'power': 1, 'div': '1,2,1,1;2,4,6'}, 'parser': 'Fixed attention'}
  0%|                                                                                           | 0/20 [00:00<?, ?it/s]15:47:59-585738 DEBUG    Server: alive=True jobs=1 requests=640 uptime=163 memory=1.58/31.83 backend=Backend.DIFFUSERS
                         state=idle
  0%|                                                                                           | 0/20 [00:42<?, ?it/s]
15:48:00-053990 ERROR    Processing: args={'prompt': 'blue sky BREAK\nbrunette hair BREAK\nbook shelf BREAK\nlamp on a
                         desk BREAK\nwomen wearing a red dress and sitting on a sofa', 'negative_prompt': [''],
                         'guidance_scale': 6, 'generator': [<torch._C.Generator object at 0x000001F1BCAB14D0>],
                         'num_inference_steps': 20, 'eta': 1.0, 'output_type': 'latent', 'width': 512, 'height': 512,
                         'rp_args': {'mode': 'rows', 'power': 1, 'div': '1,2,1,1;2,4,6'}} Expected all tensors to be on
                         the same device, but found at least two devices, cuda:0 and cpu!
15:48:00-056990 ERROR    Processing: RuntimeError
╭───────────────────────────────────────── Traceback (most recent call last) ──────────────────────────────────────────╮
│ E:\sdmaster\modules\processing_diffusers.py:122 in process_diffusers                                                 │
│                                                                                                                      │
│   121 │   │   else:                                                                                                  │
│ ❱ 122 │   │   │   output = shared.sd_model(**base_args)                                                              │
│   123 │   │   if isinstance(output, dict):                                                                           │
│                                                                                                                      │
│ E:\sdmaster\venv\lib\site-packages\torch\utils\_contextlib.py:115 in decorate_context                                │
│                                                                                                                      │
│   114 │   │   with ctx_factory():                                                                                    │
│ ❱ 115 │   │   │   return func(*args, **kwargs)                                                                       │
│   116                                                                                                                │
│                                                                                                                      │
│ C:\Users\paul_\.cache\huggingface\modules\diffusers_modules\git\regional_prompting_stable_diffusion.py:368 in __call │
│                                                                                                                      │
│   367 │   │                                                                                                          │
│ ❱ 368 │   │   output = StableDiffusionPipeline(**self.components)(                                                   │
│   369 │   │   │   prompt=prompt,                                                                                     │
│                                                                                                                      │
│ E:\sdmaster\venv\lib\site-packages\torch\utils\_contextlib.py:115 in decorate_context                                │
│                                                                                                                      │
│   114 │   │   with ctx_factory():                                                                                    │
│ ❱ 115 │   │   │   return func(*args, **kwargs)                                                                       │
│   116                                                                                                                │
│                                                                                                                      │
│ E:\sdmaster\venv\lib\site-packages\diffusers\pipelines\stable_diffusion\pipeline_stable_diffusion.py:1006 in __call_ │
│                                                                                                                      │
│   1005 │   │   │   │   # predict the noise residual                                                                  │
│ ❱ 1006 │   │   │   │   noise_pred = self.unet(                                                                       │
│   1007 │   │   │   │   │   latent_model_input,                                                                       │
│                                                                                                                      │
│                                               ... 12 frames hidden ...                                               │
│                                                                                                                      │
│ E:\sdmaster\venv\lib\site-packages\diffusers\models\attention.py:490 in forward                                      │
│                                                                                                                      │
│   489 │   │   │                                                                                                      │
│ ❱ 490 │   │   │   attn_output = self.attn2(                                                                          │
│   491 │   │   │   │   norm_hidden_states,                                                                            │
│                                                                                                                      │
│ E:\sdmaster\venv\lib\site-packages\torch\nn\modules\module.py:1532 in _wrapped_call_impl                             │
│                                                                                                                      │
│   1531 │   │   else:                                                                                                 │
│ ❱ 1532 │   │   │   return self._call_impl(*args, **kwargs)                                                           │
│   1533                                                                                                               │
│                                                                                                                      │
│ E:\sdmaster\venv\lib\site-packages\torch\nn\modules\module.py:1541 in _call_impl                                     │
│                                                                                                                      │
│   1540 │   │   │   │   or _global_forward_hooks or _global_forward_pre_hooks):                                       │
│ ❱ 1541 │   │   │   return forward_call(*args, **kwargs)                                                              │
│   1542                                                                                                               │
│                                                                                                                      │
│ C:\Users\paul_\.cache\huggingface\modules\diffusers_modules\git\regional_prompting_stable_diffusion.py:271 in forwar │
│                                                                                                                      │
│   270 │   │   │   │   │   # TODO: add support for attn.scale when we move to Torch 2.1                               │
│ ❱ 271 │   │   │   │   │   hidden_states = scaled_dot_product_attention(                                              │
│   272 │   │   │   │   │   │   self,                                                                                  │
│                                                                                                                      │
│ C:\Users\paul_\.cache\huggingface\modules\diffusers_modules\git\regional_prompting_stable_diffusion.py:615 in scaled │
│                                                                                                                      │
│   614 │   attn_weight = query @ key.transpose(-2, -1) * scale_factor                                                 │
│ ❱ 615 │   attn_weight += attn_bias                                                                                   │
│   616 │   attn_weight = torch.softmax(attn_weight, dim=-1)                                                           │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!

Backend

Diffusers

UI

Standard

Branch

Master

Model

StableDiffusion 1.5

Acknowledgements

I have read the above and searched for existing issues
I confirm that this is classified correctly and its not an extension issue

The text was updated successfully, but these errors were encountered:

vladmandic changed the title ~~[Issue]: Regional Prompting script fails when Model CPU offload is enabled~~ [Feature]: Regional Prompting script fails when Model CPU offload is enabled Aug 29, 2024

vladmandic added the enhancement New feature or request label Aug 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature]: Regional Prompting script fails when Model CPU offload is enabled #3343

[Feature]: Regional Prompting script fails when Model CPU offload is enabled #3343

brknsoul commented Jul 17, 2024 •

edited

Loading

[Feature]: Regional Prompting script fails when Model CPU offload is enabled #3343

[Feature]: Regional Prompting script fails when Model CPU offload is enabled #3343

Comments

brknsoul commented Jul 17, 2024 • edited Loading

Issue Description

Version Platform Description

Relevant log output

Backend

UI

Branch

Model

Acknowledgements

brknsoul commented Jul 17, 2024 •

edited

Loading