Computer vision

Input tensor shape

The DataLoader for vision models will often load jpeg images that are jpeg-decoded on-the-fly and batched, resulting in input tensors with a shape:

[batch_size][height][width][n_channels]  =  e.g. [8][224][224][3]

This shape is denoted as NHWC and referred to as "channels last". Another convention is to use NCHW, referred to as "channels first".

Convolutions

The most import operation in computer vision is a convolution.
It is a matrix multiply that respects spacial symmetry; the same matrix is applied everywhere.

Dumoulin, V. & Visin, F. (2016). A guide to convolution arithmetic for deep learning.

LeNet

LeCun, Y. et al. (1989). Backpropagation applied to handwritten zip code recognition.
LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based learning applied to document recognition.
Used the MNIST dataset: handwritten digits 0-9
Used for Optical Character Recogition (OCR) on checks and mail in the 1990s!
This is arguably the first impactful application of deep learning?

AlexNet

Other important labeled image datasets are CIFAR-10 and CIFAR-100 that have 10 and 100 classes, respectively.
Deng, J. et al. (2009). ImageNet: A large-scale hierarchical image database.
- ImageNet-1k dataset: 1000 image classes with about 1000 examples each.
Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). ImageNet classification with deep convolutional neural networks.
- Became known as "AlexNet"
Watershed moment in CV with deep learning

ResNet

ResNet v1 vs v2 (cv-tricks.com):

He, K. et al. (2015). Deep residual learning for image recognition.
Still an important MLPerf benchmark.

UNet

The UNet architecture:

An example of image segmentation with UNet:

Ronneberger, O., Fischer, P., & Brox, T. (2015). U-Net: Convolutional networks for biomedical image segmentation.

Diffusion

TODO

Conclusion

TODO

Up next: Natural language
Previous: Introduction

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

computer-vision.md

computer-vision.md

Computer vision

Contents

Input tensor shape

Convolutions

LeNet

AlexNet

ResNet

UNet

Diffusion

Conclusion

Files

computer-vision.md

Latest commit

History

computer-vision.md

File metadata and controls

Computer vision

Contents

Input tensor shape

Convolutions

LeNet

AlexNet

ResNet

UNet

Diffusion

Conclusion