Generating Fonts/Writing from Text Tokens #339
Replies: 2 comments 1 reply
-
16K VQGAN vs. 1024 VQGAN (f=16)Early tests with the 1024 codebook pretrained on Imagenet imply it's not very good at representing text. Here is an early comparison between the 16K codebook vqgan and the 1024 codebook vqgan - both with patch size/ Generally speaking; during training the 1024-sized codebook just can't seem to get past these I would do comparisons with the supposedly stronger GumbelVQGAN 8k codebook, f=8; but I've had a lot of trouble getting that one to train and the CompVis team has been sort of tough to get ahold of with regard to how to decode the thing. |
Beta Was this translation helpful? Give feedback.
-
Code used: I'm using the new # in dalle_pytorch/loader.py
import augly.image as imaugs
import augly.text as textaugs
# ...
self.image_transform = T.Compose([
T.Lambda(lambda img: img.convert("RGB") if img.mode != "RGB" else img),
T.CenterCrop((192, 256)) # `memeify` takes up the top 64 pixels.
])
# ...
substring = description[:20] + "\n" + description[20:40]
pil_image = PIL.Image.open(image_file)
top_cut_image = self.image_transform(pil_image)
aug_image = imaugs.meme_format(top_cut_image, text=substring, opacity=1.0, caption_height=64)
image_tensor = T.Compose([
#T.CenterCrop(256),
T.ToTensor(),
]) |
Beta Was this translation helpful? Give feedback.
-
I've been training a DALL-e with the goal of seeing whether or not a caption could be used to visualize the text itself in RGB pixels. I'm limited by my GPU but early results are certainly interesting. I'm using the oft-ignored weights from OpenAI's dVAE under the assupmtion it would better represent text (because that is mentioned as an explicit goal in the DALLE paper). But these early results are promising so I'm switching back to the pretrained VQGAN from CompVis to see if it can represent letters graphically as well as the dVaE.
In this last example - it seems to be reusing codes found for text-gen in the actual image itself; which is what I was hoping for.
@rom1504 @mehdidc @janEbert @robvanvolt @johnpaulbin @kobiso may be interested.
Beta Was this translation helpful? Give feedback.
All reactions