You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It seems that Regional Prompting requires the entire model to be on a single device, which isn't useful for those with limited vram. RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!
Tested on nvidia, zluda, and directml.
Functions normally while Model CPU offload is disabled.
15:47:09-707881 DEBUG Pipeline switch: custom=regional_prompting_stable_diffusion
15:47:10-462680 DEBUG Pipeline switch: from=StableDiffusionPipeline to=RegionalPromptingStableDiffusionPipeline
components=['vae', 'text_encoder', 'tokenizer', 'unet', 'scheduler', 'safety_checker',
'feature_extractor', 'requires_safety_checker'] skipped=['image_encoder'] missing=[]
15:47:10-463681 DEBUG Setting model VAE: upcast=False
15:47:10-465684 DEBUG Setting model: enable VAE tiling
15:47:10-474681 DEBUG Setting model: enable model CPU offload
15:47:10-492681 DEBUG Regional: args={'prompt': 'blue sky BREAK\nbrunette hair BREAK\nbook shelf BREAK\nlamp on a desk BREAK\nwomen wearing a red dress and sitting on a sofa', 'rp_args': {'mode': 'rows',
'power': 1, 'div': '1,2,1,1;2,4,6'}}
15:47:10-493680 INFO Applying hypertile: unet=256
15:47:10-601377 INFO Base: class=RegionalPromptingStableDiffusionPipeline
15:47:10-798746 DEBUG Sampler: sampler="DPM++ 2M" config={'num_train_timesteps': 1000, 'beta_start': 0.00085,
'beta_end': 0.012, 'beta_schedule': 'scaled_linear', 'prediction_type': 'epsilon',
'thresholding': False, 'sample_max_value': 1.0, 'algorithm_type': 'sde-dpmsolver++',
'solver_type': 'midpoint', 'lower_order_final': False, 'use_karras_sigmas': True,
'final_sigmas_type': 'zero', 'timestep_spacing': 'linspace', 'solver_order': 2}
15:47:10-932750 DEBUG Torch generator: device=cuda seeds=[2376105058]
15:47:10-934751 DEBUG Diffuser pipeline: RegionalPromptingStableDiffusionPipeline task=DiffusersTaskType.TEXT_2_IMAGE
batch=1/1x1 set={'prompt': 120, 'negative_prompt': 1, 'guidance_scale': 6,
'num_inference_steps': 20, 'eta': 1.0, 'output_type': 'latent', 'width': 512, 'height': 512,
'rp_args': {'mode': 'rows', 'power': 1, 'div': '1,2,1,1;2,4,6'}, 'parser': 'Fixed attention'}
0%|| 0/20 [00:00<?, ?it/s]15:47:59-585738 DEBUG Server: alive=True jobs=1 requests=640 uptime=163 memory=1.58/31.83 backend=Backend.DIFFUSERS
state=idle
0%|| 0/20 [00:42<?, ?it/s]
15:48:00-053990 ERROR Processing: args={'prompt': 'blue sky BREAK\nbrunette hair BREAK\nbook shelf BREAK\nlamp on a desk BREAK\nwomen wearing a red dress and sitting on a sofa', 'negative_prompt': [''],
'guidance_scale': 6, 'generator': [<torch._C.Generator object at 0x000001F1BCAB14D0>],
'num_inference_steps': 20, 'eta': 1.0, 'output_type': 'latent', 'width': 512, 'height': 512,
'rp_args': {'mode': 'rows', 'power': 1, 'div': '1,2,1,1;2,4,6'}} Expected all tensors to be on
the same device, but found at least two devices, cuda:0 and cpu!
15:48:00-056990 ERROR Processing: RuntimeError
╭───────────────────────────────────────── Traceback (most recent call last) ──────────────────────────────────────────╮
│ E:\sdmaster\modules\processing_diffusers.py:122 in process_diffusers │
│ │
│ 121 │ │ else: │
│ ❱ 122 │ │ │ output = shared.sd_model(**base_args) │
│ 123 │ │ if isinstance(output, dict): │
│ │
│ E:\sdmaster\venv\lib\site-packages\torch\utils\_contextlib.py:115 in decorate_context │
│ │
│ 114 │ │ with ctx_factory(): │
│ ❱ 115 │ │ │ return func(*args, **kwargs) │
│ 116 │
│ │
│ C:\Users\paul_\.cache\huggingface\modules\diffusers_modules\git\regional_prompting_stable_diffusion.py:368 in __call │
│ │
│ 367 │ │ │
│ ❱ 368 │ │ output = StableDiffusionPipeline(**self.components)( │
│ 369 │ │ │ prompt=prompt, │
│ │
│ E:\sdmaster\venv\lib\site-packages\torch\utils\_contextlib.py:115 in decorate_context │
│ │
│ 114 │ │ with ctx_factory(): │
│ ❱ 115 │ │ │ return func(*args, **kwargs) │
│ 116 │
│ │
│ E:\sdmaster\venv\lib\site-packages\diffusers\pipelines\stable_diffusion\pipeline_stable_diffusion.py:1006 in __call_ │
│ │
│ 1005 │ │ │ │ # predict the noise residual │
│ ❱ 1006 │ │ │ │ noise_pred = self.unet( │
│ 1007 │ │ │ │ │ latent_model_input, │
│ │
│ ... 12 frames hidden ... │
│ │
│ E:\sdmaster\venv\lib\site-packages\diffusers\models\attention.py:490 in forward │
│ │
│ 489 │ │ │ │
│ ❱ 490 │ │ │ attn_output = self.attn2( │
│ 491 │ │ │ │ norm_hidden_states, │
│ │
│ E:\sdmaster\venv\lib\site-packages\torch\nn\modules\module.py:1532 in _wrapped_call_impl │
│ │
│ 1531 │ │ else: │
│ ❱ 1532 │ │ │ return self._call_impl(*args, **kwargs) │
│ 1533 │
│ │
│ E:\sdmaster\venv\lib\site-packages\torch\nn\modules\module.py:1541 in _call_impl │
│ │
│ 1540 │ │ │ │ or _global_forward_hooks or _global_forward_pre_hooks): │
│ ❱ 1541 │ │ │ return forward_call(*args, **kwargs) │
│ 1542 │
│ │
│ C:\Users\paul_\.cache\huggingface\modules\diffusers_modules\git\regional_prompting_stable_diffusion.py:271 in forwar │
│ │
│ 270 │ │ │ │ │ # TODO: add support for attn.scale when we move to Torch 2.1 │
│ ❱ 271 │ │ │ │ │ hidden_states = scaled_dot_product_attention( │
│ 272 │ │ │ │ │ │ self, │
│ │
│ C:\Users\paul_\.cache\huggingface\modules\diffusers_modules\git\regional_prompting_stable_diffusion.py:615 in scaled │
│ │
│ 614 │ attn_weight = query @ key.transpose(-2, -1) * scale_factor │
│ ❱ 615 │ attn_weight += attn_bias │
│ 616 │ attn_weight = torch.softmax(attn_weight, dim=-1) │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!
Backend
Diffusers
UI
Standard
Branch
Master
Model
StableDiffusion 1.5
Acknowledgements
I have read the above and searched for existing issues
I confirm that this is classified correctly and its not an extension issue
The text was updated successfully, but these errors were encountered:
vladmandic
changed the title
[Issue]: Regional Prompting script fails when Model CPU offload is enabled
[Feature]: Regional Prompting script fails when Model CPU offload is enabled
Aug 29, 2024
Issue Description
It seems that Regional Prompting requires the entire model to be on a single device, which isn't useful for those with limited vram.
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!
Tested on nvidia, zluda, and directml.
Functions normally while Model CPU offload is disabled.
Version Platform Description
Relevant log output
Backend
Diffusers
UI
Standard
Branch
Master
Model
StableDiffusion 1.5
Acknowledgements
The text was updated successfully, but these errors were encountered: