Tensorflow implementation of WGAN-GP, Wasserstein GAN with Gradient Penalty.
Datasets
python train.py --DATASET=celeba --DATA_DIR=/path/to/celeba/
Other options include SELU activations and layer normalization. The discriminator does not use batch normalization because that would conflict with the gradient penalty, meaning instance normalization or layer normalization can still be used without issue. The authors suggest that layer normalization is used if any. Default is to not use these, but they can be used by,
python train.py --DATASET=celeba --DATA_DIR=/path/to/celeba/ --NORM=1
Here are some non cherry picked results after 100,000 training steps with batch size 128. The first image uses layer normalization, while the second image does not.
To create an image like this, simply run createPhotos.py
and point towards your checkpoint directory, like so,
python createPhotos.py checkpoints/DATASET_celeba/SCALE_10/NORM_False/SELU_False/
- Initial trials of SELU activations did not work, the model diverged pretty quickly.
- For some reason, I was getting terrible results using
tf.layers.conv2d
as opposed totf.contrib.layers.conv2d
, and I am still unsure as to why. - The last layer of the discriminator is another convolution with stride 1, kernel size of 4, and depth of 1. I found this to work much better than the typical fully connected layer.
- Using layer normalization seems to have more stable training, although it takes longer for each step (~2 seconds for batch size 128 on a GTX 1080 as opposed to ~1.5 seconds without layer norm). However, it seems to be converging faster, so it's possible that offsets the time.