This project is pretty much abandoned. I have achieved following results after spending a lot of time tuning the model. Unfortunatelly I was not able to reproduce results, which means I was lucky to get a good random seed once. Also, I am not interested in research in this direction right now, so the problem won't be fixed any time soon. C'est la vie.
When I first approached semantic manipulation problem there was no solution like CycleGAN, or later findings. And even now all of them produce artifacts.
- Use generator architecture with built-in segmentation.
- Mix original image with new patches through the segmentation mask.
- Train the whole network end-to-end.
- Use L1 identity loss to constrain Generator and reduce changes.
I am using CelebA dataset to train the model. There are two files you would need to reproduce results: img_align_celeba.zip and list_attr_celeba.txt
You can download them from here http://mmlab.ie.cuhk.edu.hk/projects/CelebA.html and put into {PROJECT_DIR}/data
After that initialize data and train the model by running
sh init_data.sh
python train.py
- In case a person is already smiling it doesn’t do any changes at all.
- It handles poorly some extreme head angles.
- There are still artifacts.
Consider following advices if you want to build this kind of a model:
- Make sure your GAN model converges without appling mask and L1 loss.
The code is inspired by pytorch-CycleGAN-and-pix2pix . This paper GANimation: Anatomically-aware Facial Animation from a Single Image arXiv:1807.09251 describes similar training scheme.