Decent text-to-image generation results on CUB200 #131
Replies: 15 comments 72 replies
-
Nice thanks for sharing, these might be the best we've seen so far! I'm curious about the text length, do you simply put at the maximum of your dataset or a maximum you cared about? |
Beta Was this translation helpful? Give feedback.
-
Incredible result! Definitely the first useful example posted that shows a clear ability to generalize outside the training set. New bar set! Thanks for sharing the loss graph! Very useful. Did you find any learning rates which stood out as the clear winner? It was my experience that - for whatever reason - dalle-pytorch seemed to just kind of occassionally work well in the 1e-4 to 5e-4 range. Sometimes it was a complete failure and sometimes it worked fine - but generally 3e-4 was the best possible learning rate. Now i'm revisiting the issue and i'm curious if I got my math wrong or something... Also, you mention that it makes mistakes. Have you tried taking the top 32 of 512 generations reranked via CLIP as in the the OpenAI paper? I'm curious if that gets rid of the deformities. |
Beta Was this translation helpful? Give feedback.
-
Great step, thanks for sharing! We are thinking currently of replicating training on multi-node, multi-gpu, using deepspeed to have a future proof way to split a very large network across GPUs if simple data parallel scheme is not sufficient anymore for the model clones to fit each in single GPUs. However, using Horovod is also a good baseline to test data parallel mode with networks that are still small enough. Would you be interested to give your code a try on a number of V100 so that we (including @afiaka87 @lucidrains) could have a look on that together? We have at our machine (https://apps.fz-juelich.de/jsc/hps/juwels/booster-overview.html, at Juelich Supercomputing Center, Germany) Horovod already running, so that should be a quite an easy take to reproduce. |
Beta Was this translation helpful? Give feedback.
-
@kobiso a quick question, which CUB200 dataset did you use? Was it the 2011 version with 11,788 images? Thanks in advance! |
Beta Was this translation helpful? Give feedback.
-
That's pretty cool! |
Beta Was this translation helpful? Give feedback.
-
Pretrained CLIP reranking
Results |
Beta Was this translation helpful? Give feedback.
-
Generate rest of image based on the given cropped image
Results |
Beta Was this translation helpful? Give feedback.
-
CUB200 trained model sharingAs you folks asked, I'm sharing CUB200 trained models (sorry for late sharing!) Hope you have fun with the models 🛩️ One more tips related to memory and performanceTraining DALLE requires lots of VRAM memory and we do not have them like OpenAI 😞 |
Beta Was this translation helpful? Give feedback.
-
Thank you @kobiso!!! Very much appreciated.
…On Sat, 3 Apr 2021, 11:22 pm ByungSoo Ko, ***@***.***> wrote:
@rom1504 <https://github.com/rom1504> I will check it next week! thanks
for letting me know :)
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#131 (reply in thread)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AFNUREJ4WVLUTAVLMNKMCE3TG4B7ZANCNFSM4Z6W32FQ>
.
|
Beta Was this translation helpful? Give feedback.
-
Attention type ('full', 'axial_row', 'axial_col', 'conv_like') works
Experimental setting
Computational cost
Training logResults
|
Beta Was this translation helpful? Give feedback.
-
Hi @kobiso, thanks for the work! I tried using the link provided: https://github.com/kobiso/DALLE-reproduction and it throws a 404 error. Any help would be appreciated! |
Beta Was this translation helpful? Give feedback.
-
@kobiso @lucidrains Thank you for your great work! |
Beta Was this translation helpful? Give feedback.
-
@kobiso Thank you for your sharing! |
Beta Was this translation helpful? Give feedback.
-
@kobiso I have been trying to replicate your results. I am wondering if my configuration needs adjustment or do I just need to train for a very long time. I was wondering after how many steps in your setup did u start seeing decent results during training? |
Beta Was this translation helpful? Give feedback.
-
DALLE on CUB200
Main results
Text to image generation and re-ranking by CLIP
Generate rest of image based on the given cropped image
Model spec
VAE
DALLE
Optimization
ReduceLROnPlateau
(PR: Add adamw and lr decay #138)Results
Generation during training
Generation by input text
Training graph
Opinion
Beta Was this translation helpful? Give feedback.
All reactions