Reimage-GPT

Leveraging GPT-2 (or GPT-anything) to generate better prompts for stable diffusion 2.

The way it works is the following:

We sample a number of images from the COCO dataset and use them as our training data,
Then we pass these images to a frozen version of MetaAI's Detectron 2 model, which would give us a json describing the items in the picture.
Then we would this json to generate a simplified string describing this image.
Pass this string to GPT-2 with a system prompt telling it to come up with an image-generation prompt and how the string was structured.
Then the output of this step would be passed into a frozen Stable Diffusion 2 pipeline to generate the output image
We used the SSIM loss to compare the input image and the output image to fine-tune the weights of our GPT-2 instance.

[Final Project for UCLA's COM SCI 263 - Natural Language Processing]

Provide feedback