Skip to content

Latest commit

 

History

History
60 lines (51 loc) · 8.83 KB

README.md

File metadata and controls

60 lines (51 loc) · 8.83 KB

banner

Single Image Super-Resolution Using a Seven-Layer Efficient Sub-Pixel Convolutional Neural Network

Joshua Lo (Peer Mentor), Justin Ashbaugh, Jared Habermehl, Allen Tu, Addison Waller

Super-resolution is the process of recovering a high resolution (HR) image or video from its low resolution (LR) counterpart. It has a myriad of applications in many fields, including autonomous vehicles, medical imaging, security, and entertainment.

diagram

Machine learning super resolution uses a model trained with a dataset of images to predict additional pixels from a LR image input, essentially "filling in" the gaps in between the pixels of a LR image to create a HR output. We refer to a recovered HR image as a super-resolved (SR) image. A SR image has more pixels than the LR image that it was created from, so it contains more information and will be appear clearer due to its higher pixel density.

Our model uses a series of convolutional layers to extract, or learn, information from the LR image. Then, it combines the data that it collected to create the SR image. In technical terms, this is a seven-layer Efficient Sub-Pixel Convolutional Neural Network that takes a LR image input, extracts LR feature maps through a series of convolutional layers, then applies a sub-pixel convolution layer to assemble the LR feature maps into a HR image output. This project is written in TensorFlow and is based on Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network.

Interactive Notebook and Demonstration Video

Demonstration Video

Learn about this project and use a trained model without any setup by visiting our Interactive Notebook, hosted on Google Colab. Watch our demonstration video on YouTube for a high-level walkthrough.

Project Setup

  1. Clone this repository.
  2. Run conda env create -f environment.yml from this directory to create the project environment. Activate it using conda activate image-super-resolution. Refer to Environment for a list of required packages.
  3. Follow the directions in data/ to assemble the dataset.
  4. Refer to Training and Testing to train and test the model.

Environment

The following packages are required to use this project: python=3.6, scipy, tqdm, matplotlib, jupyterlab, scikit-image, pillow=6.1, tensorflow=2.0, opnecv, conda-forge::imgaug.

This directory contains the datasets, the scripts to download and process them, and the Keras data generator. Its README contains extensive documentation and instructions for assembling the required datasets.

Model

architecture The Efficient Sub-Pixel Convolutional Neural Network (ESPCN) model is a machine learning Single Image Super-Resolution (SISR) model that takes a LR image input, extracts LR feature maps through a series of convolutional layers, then uses a sub-pixel convolution layer to convert the LR feature maps into a HR image output. Refer to model/ for documentation of our model and the Peak Signal to Noise Ratio (PSNR) function.

This directory contains the model, PSNR function, and any saved weights. Its README contains extensive documentation.

Training

Trains the seven-layer ESPCN model in Keras. The DataGenerator loads LR and HR images from the training dataset, which consists of images in the DIV2K dataset. Saves the trained weights at model/weights/r[r]bs[batch_size]epochs[epochs]weights.h5 (e.g. the pre-trained weights are saved at model/weights/r3bs10epochs100weights.h5).

Includes the importable function train. Run this script from the console with python training.py [upscale_factor] [batch_size] [epochs]. upscale_factor is the r that the images will be super-resolved with. Batch size is the number of images generated by the DataGenerator in each batch. epochs is the number of epochs for which the model will be trained before it is saved.

Shows the model summary and uses train to train the model in a Jupyter Notebook.

Testing

Benchmarks a trained model by calculating the average PSNR of 729 inputs. For each image, the DataGenerator generates a LR, HR image pair of the model's upscale_factor from the testing dataset, which consists of all images in the Classical SR dataset. Then, the LR image is super-resolved to create the SR image and the psnr function compares it to the HR image. The average PSNR is calculated and the PSNR of each image can also be optionally shown.

Includes the importable function benchmark. Run this script from the console with python testing.py [weights_filename] [upscale_factor] [off (optional)]. weights_filename is the filename of trained weights saved in model/weights without the directory path (e.g. r3bs10epochs100weights.h5). upscale_factor is the r that the weights were trained to super-resolve the input by. Include off to hide output for individual images; otherwise, the PSNR of each image and the cumulative average PSNR at that point will be printed.

Tests trained models in a Jupyter Notebook. Compiles the model, loads saved weights from the model/weights, and initializes the testing DataGenerator. Displays the LR, SR, and HR versions of an individual image side-by-side for qualitative comparison and displays the PSNR of the SR and HR images for quantitative comparison. Uses benchmark to calculate the average PSNR of the testing dataset.

Results

The following graphic was generated in testing.ipynb using the weights saved at model/weights/r3bs10epochs100weights.h5. example The SR image generated by our model is both qualitatively and quantitatively higher quality than the LR image. In our benchmarking, the PSNR of the SR and HR image is approximately 24-26dB, which is within or exceeds the acceptable values for wireless transmission quality loss (20 to 25 dB).

The results are decent, but it is visibly apparent that the SR image is not quite on par with the HR image. Therefore, there is room for improvement:

  • Create a model with more layers in hopes of capturing more information during training. This is called creating a deeper model.
  • Add more images to the training dataset to provide more input to the model. This is called creating a wider model.
  • Train for more epochs (train this model longer). More information can be captured this way, but it is possible to overfit a model and negatively impact your results.
  • Fine tune the convolutional layers in espcn_model through experimentation to more effectively capture information from the LR images in the training dataset. Since we chose the parameters for Conv2D as estimates, this is our most promising next step.

Improvements to the program would lead to even clearer SR images with higher PSNR scores.

References

  1. W. Shi et al., "Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network," 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, 2016, pp. 1874-1883, doi: 10.1109/CVPR.2016.207. Available: https://ieeexplore.ieee.org/document/7780576.
  2. “Common Datasets for Image Super-Resolution,” CV Notes, 22-Sep-2019. [Online]. Available: https://cvnote.ddlee.cn/2019/09/22/image-super-resolution-datasets.
  3. “Peak signal-to-noise ratio,” Wikipedia, 26-Nov-2020. [Online]. Available: https://en.wikipedia.org/wiki/Peak_signal-to-noise_ratio.
  4. “Super-resolution imaging,” Wikipedia, 02-Dec-2020. [Online]. Available: https://en.wikipedia.org/wiki/Super-resolution_imaging.