Skip to content

A Tensorflow implementation of the paper 'Image-to-Image Translation with Conditional Adversarial Nets' by Isola, et al.

Notifications You must be signed in to change notification settings

sarahwolf32/Image-to-Image-Translation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

46 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Image-to-Image Translation with Conditional Adversarial Nets

This is a Tensorflow implementation of 'Image-to-Image Translation with Conditional Adversarial Nets' by Isola, et al.

This paper showed that conditional generative adversarial networks (cGANs) are broadly applicable in the domain of image-to-image translation. They created a cGAN architecture well-suited to these types of problems, and showed it working well on sketch-to-photo, day-to-night, grayscale-to-color, segmentation-to-photo tasks, and more.

While there had previously existed specific solutions taylored to many of these problems, a generic framework that can handle all of them is clearly of great value.

Generator Architecture: U-Net

All image-to-image translation problems require the "high-level" features (what is it an image of?) to stay the same, and the surface-level features (how it looks) to change.

The need to understand and keep the high level essence of the image is well-suited to an encoder-decoder architecture. In this structure, the input image x is fed into the encoder, which makes its feature map smaller and the number of channels bigger at each step until the representation is transformed into a single vector. This vector encapsulates the essential high-level details of the image (e.g., this is a picture of a red flower). This is then fed into the decoder network, which creates an output image. The layer sizes of the decoder exactly mirror those of the encoder.

However, a plain encoder-decoder has some drawbacks. Since all input image information needs to be passed through a bottleneck, some low level details of interest (e.g., the location of edges) can be lost. To solve this, the authors added "skip connections" between corresponding layers in the encoder and decoder. These connections simply concatenate the output of a given encoder layer to the output of the decoder layer of the same size.

The resulting architecture is called a "U-Net":

* This image is taken from the paper.

Discriminator Architecture: PatchGAN

Generally, GANs use a discriminator as an adaptive loss function. This paper uses a discriminator network for one part of the loss function, designed to capture surface level features.

Since these "texture-like" features do not need to encapsulate the entire image's high-level contents, it is sufficient to evaluate smaller patches of the image. This is both more computationally efficient and allows the discriminator to evaluate arbitrarily large images. This "PatchGAN", as it is called, is fed 70x70 patches of an image, and outputs the probability of each being "real" (as opposed to generated).

Loss Functions

The loss function for training the generator uses both an L1 loss component (for high level features), and a PatchGAN component (for style/texture features).

The PatchGAN loss trains the generator to maximize D(G(x)), the probability the discriminator assigns a given generated image patch of being real. On the other hand, the L1 loss trains the generator to produce output pixels as similar to the corresponding pixels in the real image as possible. The PatchGAN element is essential, because L1-loss alone tends to average plausible outcomes and create blurry images. We weight the relative influence of the L1 and GAN loss with the hyperparameters α and β.

The discriminator is trained with a typical GAN discriminator loss, with which it attempts to maximize D(y), the probability output for real images, and minimize D(G(x)), the probability output for generated images.

Train Your Own

Start Training

To train your own model using this code:

  1. Format your training data as a folder of [x|y] images, where each has the shape [img_size, 2 * img_size]. By default, the code expects an img_size of 256, but this can be changed in the architecture.py file. Just be sure to keep the size a power of two, and be aware that larger images require a larger generator and longer training times.

  2. Download this code and navigate into the project directory.

  3. To start training, run python -m trainer.task --data-dir [PATH_TO_YOUR_DATA].

By default, checkpoints will be saved at regular intervals to the checkpoints folder. This output location can be configured with the --checkpoint-dir option.

Event files will be saved at regular intervals to the summary folder, so you can view the training progress in Tensorboard. This default location can be altered with the --summary-dir argument.

Hyperparameters are set to reasonable defaults, but you can change them via command-line arguments (--num-epochs, --batch-size) and in architecture.py (img_size, input_channels, output_channels, max number of channels, dropout probs, and the α and β weights for the generator loss).

Continue Training

If you need to stop and restart training, but don't want to lose your progress, just run python -m trainer.task --continue-train True --checkpoint-dir [PATH_TO_CHECKPOINT_FILES].

Sample

To use your trained model to translate images, you will need a folder of input images x. Then, run python -m trainer.task --translate-image-dir [PATH_TO_INPUT_IMAGES] --sample-dir [WHERE_TO_SAVE_OUTPUT_IMAGES].

Acknowledgements

About

A Tensorflow implementation of the paper 'Image-to-Image Translation with Conditional Adversarial Nets' by Isola, et al.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages