Skip to content

Latest commit

 

History

History
13 lines (10 loc) · 902 Bytes

README.md

File metadata and controls

13 lines (10 loc) · 902 Bytes

Reimage-GPT

Leveraging GPT-2 (or GPT-anything) to generate better prompts for stable diffusion 2.

The way it works is the following:

  1. We sample a number of images from the COCO dataset and use them as our training data,
  2. Then we pass these images to a frozen version of MetaAI's Detectron 2 model, which would give us a json describing the items in the picture.
  3. Then we would this json to generate a simplified string describing this image.
  4. Pass this string to GPT-2 with a system prompt telling it to come up with an image-generation prompt and how the string was structured.
  5. Then the output of this step would be passed into a frozen Stable Diffusion 2 pipeline to generate the output image
  6. We used the SSIM loss to compare the input image and the output image to fine-tune the weights of our GPT-2 instance.

[Final Project for UCLA's COM SCI 263 - Natural Language Processing]