Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

not run #44

Open
ayttop opened this issue Oct 27, 2024 · 30 comments
Open

not run #44

ayttop opened this issue Oct 27, 2024 · 30 comments

Comments

@ayttop
Copy link

ayttop commented Oct 27, 2024

not run on colab t4

from OmniGen import OmniGenPipeline
import torch
import os
os.environ["CUDA_VISIBLE_DEVICES"] = "0"
import transformers
transformers.logging.set_verbosity_error()
device = "cuda" if torch.cuda.is_available() else "cpu"
pipe = OmniGenPipeline.from_pretrained("Shitao/OmniGen-v1", device_map=device)

Text to Image

images = pipe(
prompt="A curly-haired man in a red shirt is drinking tea.",
height=768,
width=512,
guidance_scale=1,
seed=0,
separate_cfg_infer=True,
num_inference_steps=1,
num_images_per_prompt=1,
use_kv_cache=True
)
images[0].save("example_t2i.png") # save output PIL Image

Text to Image

images = pipe(
prompt="A curly-haired man in a red shirt is drinking tea.",
height=768,
width=512,
guidance_scale=1,
seed=0,
separate_cfg_infer=True,
num_inference_steps=1,
num_images_per_prompt=1,
use_kv_cache=True
)
images[0].save("example_t2i.png") # save output PIL Image


TypeError Traceback (most recent call last)
in <cell line: 8>()
6 transformers.logging.set_verbosity_error()
7 device = "cuda" if torch.cuda.is_available() else "cpu"
----> 8 pipe = OmniGenPipeline.from_pretrained("Shitao/OmniGen-v1", device_map=device)
9
10 # Text to Image

TypeError: OmniGenPipeline.from_pretrained() got an unexpected keyword argument 'device_map'

@ayttop
Copy link
Author

ayttop commented Oct 27, 2024

not run with accelerate,bitsandbytes
from OmniGen import OmniGenPipeline
from accelerate import init_empty_weights
import bitsandbytes as bnb

Initialize the model with empty weights to save memory

with init_empty_weights():
pipe = OmniGenPipeline.from_pretrained(
"Shitao/OmniGen-v1",
device_map="auto", # Automatically maps model layers to available devices
torch_dtype=bnb.float16, # Set data type for bitsandbytes
load_in_4bit=True # Load model in 4-bit precision using bitsandbytes
)

@staoxiao
Copy link
Contributor

Current code doesn't support quantization. We will consider this in the future.

@able2608
Copy link

Apparently someone did try to implement quantization, however it is still a WIP and might be somewhat fiddly to use. Check out this PR if you are interested in using it: #29.
You might need to tweak some files as discussed in the PR's discussion after downloading it to get it to work, plus Colab RAM (yes RAM not VRAM) only caps at 12GB for free tier users, so the quantization process will be slow at least and will probably straight up OOM for now. It pretty much filled up the 16GB of RAM on my system running Windows 11 and requires extensive offloading to the disk when quantizing. However judging from the VRAM usage on my system, once the quantization process is over the model might be able to fit in T4's VRAM. Perhaps you would want to wait for the code to be more optimized.

@nitinmukesh
Copy link

nitinmukesh commented Oct 28, 2024

@able2608 @staoxiao

It is working on low VRAM

vlcsnap-2024-10-29-00h58m15s559

try this
https://www.youtube.com/watch?v=9ZXmXA2AJZ4

@ayttop
Copy link
Author

ayttop commented Oct 28, 2024

2024-10-28 22:00:46.180633: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:485] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-10-28 22:00:46.455676: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:8454] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-10-28 22:00:46.544035: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1452] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-10-28 22:00:47.043118: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-10-28 22:00:49.018393: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run gradio deploy from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)
Fetching 10 files: 100% 10/10 [00:00<00:00, 124460.06it/s]

but dont work with colab t4

@ayttop
Copy link
Author

ayttop commented Oct 28, 2024

!git clone https://github.com/Manni1000/OmniGen.git

%cd OmniGen

!pip install -e .

!pip install gradio spaces

!apt install net-tools -y

!netstat -an | grep 7860

from google.colab import output

!python /content/OmniGen/app.py

@ayttop
Copy link
Author

ayttop commented Oct 28, 2024

!pip install -r /content/OmniGen/requirements.txt

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
gcsfs 2024.6.1 requires fsspec==2024.6.1, but you have fsspec 2024.5.0 which is incompatible.
torchaudio 2.1.1+cu121 requires torch==2.1.1, but you have torch 2.3.1+cu121 which is incompatible.
Successfully installed fsspec-2024.5.0 torch-2.3.1+cu121 torchvision-0.18.1+cu121 triton-2.3.1

@ayttop
Copy link
Author

ayttop commented Oct 28, 2024

2024-10-28 22:29:35.060112: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:485] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-10-28 22:29:35.093217: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:8454] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-10-28 22:29:35.103173: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1452] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-10-28 22:29:35.126043: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-10-28 22:29:36.350897: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run gradio deploy from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)
Fetching 10 files: 100% 10/10 [00:00<00:00, 50472.97it/s]
Screenshot 2024-10-28 153022

@ayttop
Copy link
Author

ayttop commented Oct 28, 2024

gpu not run on colab t4

@ayttop
Copy link
Author

ayttop commented Oct 28, 2024

Uploading Screenshot 2024-10-28 153256.png…

@ayttop
Copy link
Author

ayttop commented Oct 28, 2024

Screenshot 2024-10-28 153256

@ayttop
Copy link
Author

ayttop commented Oct 28, 2024

on colab tpu

!python /content/OmniGen/app.py
/usr/local/lib/python3.10/dist-packages/gradio/utils.py:980: UserWarning: Expected 11 arguments for function <function generate_image at 0x7d4a6ec5e290>, received 10.
warnings.warn(
/usr/local/lib/python3.10/dist-packages/gradio/utils.py:984: UserWarning: Expected at least 11 arguments for function <function generate_image at 0x7d4a6ec5e290>, received 10.
warnings.warn(

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run gradio deploy from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)
/usr/local/lib/python3.10/dist-packages/gradio/helpers.py:987: UserWarning: Unexpected argument. Filling with None.
warnings.warn("Unexpected argument. Filling with None.")
Fetching 10 files: 100% 10/10 [00:00<00:00, 93832.30it/s]
Loading safetensors
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/gradio/queueing.py", line 624, in process_events
response = await route_utils.call_process_api(
File "/usr/local/lib/python3.10/dist-packages/gradio/route_utils.py", line 323, in call_process_api
output = await app.get_blocks().process_api(
File "/usr/local/lib/python3.10/dist-packages/gradio/blocks.py", line 2018, in process_api
result = await self.call_function(
File "/usr/local/lib/python3.10/dist-packages/gradio/blocks.py", line 1567, in call_function
prediction = await anyio.to_thread.run_sync( # type: ignore
File "/usr/local/lib/python3.10/dist-packages/anyio/to_thread.py", line 33, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File "/usr/local/lib/python3.10/dist-packages/anyio/_backends/_asyncio.py", line 877, in run_sync_in_worker_thread
return await future
File "/usr/local/lib/python3.10/dist-packages/anyio/_backends/_asyncio.py", line 807, in run
result = context.run(func, *args)
File "/usr/local/lib/python3.10/dist-packages/gradio/utils.py", line 846, in wrapper
response = f(*args, **kwargs)
File "/content/OmniGen/app.py", line 51, in generate_image
output = pipe(
File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "/content/OmniGen/OmniGen/pipeline.py", line 189, in call
generator = torch.Generator(device=self.device).manual_seed(seed)
RuntimeError: manual_seed expected a long, but got bool

@ayttop
Copy link
Author

ayttop commented Oct 28, 2024

Screenshot 2024-10-28 160844

@werruww
Copy link

werruww commented Oct 29, 2024

Collecting cloud-tpu-client==0.10
Downloading cloud_tpu_client-0.10-py3-none-any.whl.metadata (1.2 kB)
Collecting torch==1.13.0
Downloading torch-1.13.0-cp310-cp310-manylinux1_x86_64.whl.metadata (23 kB)
Collecting torchvision==0.14.0
Downloading torchvision-0.14.0-cp310-cp310-manylinux1_x86_64.whl.metadata (11 kB)
Collecting torchtext==0.14.0
Downloading torchtext-0.14.0-cp310-cp310-manylinux1_x86_64.whl.metadata (6.9 kB)
ERROR: Could not find a version that satisfies the requirement torch_xla==1.13 (from versions: 2.1.0rc5, 2.1.0, 2.2.0, 2.3.0, 2.4.0, 2.5.0)
ERROR: No matching distribution found for torch_xla==1.13

@yuezewang
Copy link
Collaborator

not run on colab t4

from OmniGen import OmniGenPipeline import torch import os os.environ["CUDA_VISIBLE_DEVICES"] = "0" import transformers transformers.logging.set_verbosity_error() device = "cuda" if torch.cuda.is_available() else "cpu" pipe = OmniGenPipeline.from_pretrained("Shitao/OmniGen-v1", device_map=device)

Text to Image

images = pipe( prompt="A curly-haired man in a red shirt is drinking tea.", height=768, width=512, guidance_scale=1, seed=0, separate_cfg_infer=True, num_inference_steps=1, num_images_per_prompt=1, use_kv_cache=True ) images[0].save("example_t2i.png") # save output PIL Image

Text to Image

images = pipe( prompt="A curly-haired man in a red shirt is drinking tea.", height=768, width=512, guidance_scale=1, seed=0, separate_cfg_infer=True, num_inference_steps=1, num_images_per_prompt=1, use_kv_cache=True ) images[0].save("example_t2i.png") # save output PIL Image

TypeError Traceback (most recent call last) in <cell line: 8>() 6 transformers.logging.set_verbosity_error() 7 device = "cuda" if torch.cuda.is_available() else "cpu" ----> 8 pipe = OmniGenPipeline.from_pretrained("Shitao/OmniGen-v1", device_map=device) 9 10 # Text to Image

TypeError: OmniGenPipeline.from_pretrained() got an unexpected keyword argument 'device_map'

Hello, you should remove the device_map=device as:

# The pipeline will detect valid gpu device automatically
pipe = OmniGenPipeline.from_pretrained("Shitao/OmniGen-v1")  # so just remove ', device_map=device'

@werruww
Copy link

werruww commented Oct 31, 2024

The problem is that I want to run it in Colab T4 and the RAM is 12, so I want to either quantize it and then use it after saving it in T4 or use it with acclrate device_map=device

@werruww
Copy link

werruww commented Oct 31, 2024

yuezewang

it not run on gpu colab t4

Your session crashed after using all available RAM.

from OmniGen import OmniGenPipeline

pipe = OmniGenPipeline.from_pretrained("goodasdgood/OmniGen_quantization")

Text to Image

images = pipe(
prompt="A curly-haired man in a red shirt is drinking tea.",
height=1024,
width=1024,
guidance_scale=2.5,
seed=0,
)
images[0].save("example_t2i.png") # save output PIL Image

@werruww
Copy link

werruww commented Oct 31, 2024

@werruww
Copy link

werruww commented Oct 31, 2024

yuezewang

where path to model quantization?

@Ordoumpozanis
Copy link

Ordoumpozanis commented Nov 1, 2024

in order to run i and bypass the error od device i just exposed the device to pipeline.py

Define device globally (optional)

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f'Device ={device}')

90
class OmniGenPipeline:
def init(
self,
vae: AutoencoderKL,
model: OmniGen,
processor: OmniGenProcessor,
):
self.vae = vae
self.model = model
self.processor = processor
self.model.to(torch.bfloat16)
self.model.eval()
self.vae.eval()

    self.model_cpu_offload = False

then replace any sel.device with device as now is global and it will work

@Qarqor5555555
Copy link

Question: Does adding compression to the loading function not store the model on the hard disk? Is this method different from the method for converting a model to 4 bit? Like unshulesh

@Qarqor5555555
Copy link

Question: Does adding pressure in the loading function differ from the method of converting the model to 4bit like unsulsh

@ayttop
Copy link
Author

ayttop commented Nov 1, 2024

NameError: name 'is_torch_npu_available' is not defined. Did you mean: 'is_torch_xla_available'?

@ayttop
Copy link
Author

ayttop commented Nov 1, 2024

from OmniGen import OmniGenPipeline

import torch
pipe = OmniGenPipeline.from_pretrained("C:/Users/m/Desktop/4/OmniGen-v1")

Text to Image

images = pipe(
prompt="car.",
height=64,
width=64,
num_inference_steps=2,
guidance_scale=2,
seed=0,
)
images[0].save("example_t2i.png") # save output PIL Image

NameError: name 'is_torch_npu_available' is not defined. Did you mean: 'is_torch_xla_available'?

@ronfromhp
Copy link

@able2608 @staoxiao

It is working on low VRAM

vlcsnap-2024-10-29-00h58m15s559

try this https://www.youtube.com/watch?v=9ZXmXA2AJZ4

wait, how are you getting it 30 times faster than mine?
image
this is for the exact same prompt

@staoxiao
Copy link
Contributor

staoxiao commented Nov 2, 2024

@ronfromhp , do you have a GPU? Running on CPU is very slow. You can try the latest code, and refer to https://github.com/VectorSpaceLab/OmniGen/blob/main/docs/inference.md#requiremented-resources for inference time.

@ronfromhp
Copy link

ronfromhp commented Nov 2, 2024

@staoxiao , I have a RTX 4050 laptop GPU 6gb. So it must be running slow because of that. But i tried the forked repo of the guy i was replying to #44 (comment) and it seems he's got a quantised model working that's like 50-100 times faster on my gpu

@nitinmukesh
Copy link

@ronfromhp

Can you confirm that my fork is working fine for you and the generation is fast? Other viewers of my channel confirmed that it is working good.

@ronfromhp
Copy link

@nitinmukesh , upto a certain point, it is fast. But it fails at a certain point if i give two input images prompts and ask for a 1080p output for example. Then it falls back to 280 sec/step. I'd describe it as a sigmoid curve, if you exceed a certain threshold it becomes 50 ish times slower

@NormalMultiaccount
Copy link

Has anyone had omnigen run on collab?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

10 participants