This notebook is a simple implementation of a Variational Autoencoder (VAE) using PyTorch. The VAE is a generative model that learns to encode and decode data.
The model is trained on Anime Face dataset and Cartoon face dataset, both of which are famous datasets.
Some samples of the dataset are shown below:
A VAE consists of two main components:
- Encoder: Maps the input data to a latent space.
- Decoder: Reconstructs the data from the latent space.
The encoder maps the input data
The latent variable
The decoder maps the latent variable
The loss function of a VAE consists of two parts:
- Reconstruction Loss: Measures how well the decoder can reconstruct the input data from the latent variable.
- KL Divergence: Measures how close the learned latent distribution is to the prior distribution (usually a standard normal distribution).
The total loss is given by:
Where:
-
$q(z|x)$ is the approximate posterior distribution (output of the encoder). -
$p(x|z)$ is the likelihood of the data given the latent variable (output of the decoder). -
$p(z)$ is the prior distribution (usually a standard normal distribution).
Our model consists of two fully convoluted neural networks, one for the encoder and one for the decoder.
The generated images are shown below:
It can be seen that using a VAE, the model can learn to reconstruct the data from the latent space, also if we give the decoder a noise input, we can generate new images which doesn't exists, this application can be useful for data augmentation or many other things.