Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Attempting to recreate Regional Prompting in Omnigen using only words… #107

Open
adamreading opened this issue Nov 8, 2024 · 4 comments

Comments

@adamreading
Copy link

Because of Image limitations here I will have to spread this over a couple of posts but - everyone is going on about the new Flux Regional Prompting - I was saying - we don’t need this with omnigen - but actually I can’t work out how to do it…

This image is the examples given for Flux -
regional-prompting-for-flux-is-out-v0-bflii35ja2zd1

@adamreading
Copy link
Author

I tried the first one in Omnigen with the prompt - A surreal landscape split by a river: summer's lava-spewing volcano and lush forests on one side, winter's massive iceberg under a luminous moon on the other. ( I also tried a load of variations on merge / blend / combine etc to no avail) they all come out looking like
IMG_9723

@staoxiao
Copy link
Contributor

staoxiao commented Nov 8, 2024

Hi, @adamreading, thanks for your attention to our work! I think it's difficult to express these regions' positions using text alone. These Regional Prompting can be viewed as a visual condition. You enable OmniGen with Regional Prompting via simple fine-tuning. You just need to input the mask as an image along with the text and fine-tune the model with this type of data.

@adamreading
Copy link
Author

Hi, @adamreading, thanks for your attention to our work! I think it's difficult to express these regions' positions using text alone. These Regional Prompting can be viewed as a visual condition. You enable OmniGen with Regional Prompting via simple fine-tuning. You just need to input the mask as an image along with the text and fine-tune the model with this type of data.

Thanks for responding - I have been testing all the versions that were created to work with Comfyu - so far even in an Ultra GPU cloud server - with 48Gb VRAM and 32Gb Ram -L40S - I can still only get to 32 Seconds Text to Image and 80-100 seconds Image to Image - without adding any extra masks etc. To be usable in the main environment I would want it for, the comparable flux workflows are running in 6-10 seconds. I’ll keep an eye on developments and I truly love the idea of what you have created - but it’s got to be faster somehow lol…

@staoxiao
Copy link
Contributor

staoxiao commented Nov 8, 2024

Thank you for your feedback! We will continue to optimize the model. I believe that with technological advancements, unified image generation models like OmniGen will become faster.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants