This repository is a summary of the final project under the TUM Image and Video Compression Laboratory.
The DCU-VAE is proposed for the following reasons:
-
Dilated Kernels for Efficient Downsampling
Traditional convolutional kernels increase the receptive field by enlarging the kernel size, which increases the number of parameters. In contrast, dilated kernels allow for efficient downsampling while maintaining global structure, reducing computational resource consumption. -
Overcoming Error Accumulation in Sequential VAE
Traditional VAEs are sequential in nature, where decoding from downsampled images can make it difficult to recover high-frequency details. Furthermore, in the approach proposed in , each layer's prior is conditioned on the output of the previous layer. If there is an error in the earlier layer, it accumulates and affects the later stages. To address this, a U-Net-like architecture with skip connections is adopted to provide multi-scale priors and mitigate error propagation.
- Ballé, J., Minnen, D., Singh, S., Hwang, S. J., & Johnston, N. (2018). Variational image compression with a scale hyperprior. arXiv:1802.01436. Link.
- Kingma, D. P., Welling, M. (2013). Auto-encoding variational bayes. arXiv:1312.6114. Link.
- Wang, Z., Zou, Y., & Liu, P. X. (2021). Hybrid dilation and attention residual U-Net for medical image segmentation. Computers in Biology and Medicine, 134, 104449. Link.
- Train a Variational Autoencoder (VAE) to Generate Images, MathWorks Documentation.