new 32 layer model #396
Jack000
started this conversation in
Show and tell
Replies: 2 comments
-
very cool stuff!! I'm curious to see more examples, for example the ones from https://openai.com/blog/dall-e/ |
Beta Was this translation helpful? Give feedback.
0 replies
-
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
hey guys
following up on my previous post, I trained a 32 layer dalle-pytorch model. Here's the link https://github.com/Jack000/DALLE-pytorch/
I also put up the deepspeed checkpoint here: https://dall-3.com/models/dalle/
model settings:
I chose a size that would converge in a reasonable amount of time on 4x3090 (ie. months, not years) and this seemed to be about right.
For the learning rate I started with what I thought was a large value (1e-3) and halved it manually when the loss appeared to plateau, ending with 2e-5 after about a month. At this point I can't see any change in loss even with 0.999 ema on the loss curve, so I'm calling it done, but I suspect it's still under-trained.
For the dataset, I used a filtered version of Laion-400m and added some images I scraped myself, for a total of 60 million images. I filtered the data to include mostly photographic images (clip filtering is pretty noisy) and also to remove images with watermarks.
There are sample images in the github repo but here are some failure cases I noticed.
a green pentagonal clock. a green clock shaped like a pentagon
seems to have trouble with shapes, but to be fair even OpenAI's DALL-E has trouble with this
barak obama appears at a press conference
I think a larger transformer would help it memorize specific people/places
the birth of sentient ai
If the prompt can be interpreted as an article title or news release it has a tendency to generate image-text. The data might need more aggressive filtering.
Overall I think you really need a larger transformer to get the generalization capabilities of OpenAI's DALL-E, but this model works pretty well for common object classes.
some notes on training:
anyways, I'll probably keep training this whenever my training rig is idle
Beta Was this translation helpful? Give feedback.
All reactions