Skip to content

Commit

Permalink
Add lesson on computer vision
Browse files Browse the repository at this point in the history
  • Loading branch information
shwars committed May 17, 2022
1 parent 33315fd commit fa66a6b
Show file tree
Hide file tree
Showing 16 changed files with 997 additions and 7 deletions.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -63,7 +63,7 @@ For a gentle introduction to *AI in the Cloud* topics you may consider taking th
<td><a href="https://docs.microsoft.com/learn/modules/intro-computer-vision-pytorch/?WT.mc_id=academic-57639-dmitryso">MS Learn</a></td>
<td><a href="https://docs.microsoft.com/learn/modules/intro-computer-vision-TensorFlow/?WT.mc_id=academic-57639-dmitryso">MS Learn</a></td>
<td>PAT</td></tr>
<tr><td>6</td><td>Intro to Computer Vision. OpenCV</td><td>Text<td colspan="2">Notebook</td><td></td></tr>
<tr><td>6</td><td>Intro to Computer Vision. OpenCV</td><td><a href="lessons/4-ComputerVision/06-IntroCV/README.md">Text</a><td colspan="2"><a href="lessons/4-ComputerVision/06-IntroCV/OpenCV.ipynb">Notebook</a></td><td><a href="lessons/4-ComputerVision/06-IntroCV/lab/README.md">Lab</a></td></tr>
<tr><td>7</td><td>Convolutional Neural Networks<br/>CNN Architectures</td><td><a href="lessons/4-ComputerVision/07-ConvNets/README.md">Text</a><br/><a href="lessons/4-ComputerVision/07-ConvNets/CNN_Architectures.md">Text</a></td><td><a href="lessons/4-ComputerVision/07-ConvNets/ConvNetsPyTorch.ipynb">PyTorch</a></td><td><a href="lessons/4-ComputerVision/07-ConvNets/ConvNetsTF.ipynb">TensorFlow</a></td><td><a href="lessons/4-ComputerVision/07-ConvNets/lab/README.md">Lab</a></td></tr>
<tr><td>8</td><td>Pre-trained Networks and Transfer Learning<br/>Training Tricks</td><td><a href="lessons/4-ComputerVision/08-TransferLearning/README.md">Text</a><br/><a href="lessons/4-ComputerVision/08-TransferLearning/TrainingTricks.md">Text</a></td><td><a href="lessons/4-ComputerVision/08-TransferLearning/TransferLearningPyTorch.ipynb">PyTorch</a></td><td><a href="lessons/4-ComputerVision/08-TransferLearning/TransferLearningTF.ipynb">TensorFlow</a><br/><a href="lessons/4-ComputerVision/08-TransferLearning/Dropout.ipynb">Dropout sample</a></td><td><a href="lessons/4-ComputerVision/08-TransferLearning/lab/README.md">Lab</a></td></tr>
<tr><td>9</td><td>Autoencoders and VAEs</td><td><a href="lessons/4-ComputerVision/09-Autoencoders/README.md">Text</a></td><td><a href="lessons/4-ComputerVision/09-Autoencoders/AutoEncodersPytorch.ipynb">PyTorch</td><td><a href="lessons/4-ComputerVision/09-Autoencoders/AutoencodersTF.ipynb">TensorFlow</a></td><td></td></tr>
Expand Down
2 changes: 0 additions & 2 deletions etc/CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,9 +17,7 @@ or contact [[email protected]](mailto:[email protected]) with any addi

We are currently actively looking for contributions on the following topics:

- [ ] Write section + notebook on OpenCV / preprocessing images
- [ ] Write section on Deep Reinforcement Learning
- [ ] Translate Semantic Segmentation notebook to TensorFlow
- [ ] Improve section + notebook on Object Detection
- [ ] PyTorch Lightning (for [this section](https://github.com/microsoft/AI-For-Beginners/blob/main/3-NeuralNetworks/05-Frameworks/README.md))
- [ ] Write section + samples on Named Entity Recognition
Expand Down
774 changes: 774 additions & 0 deletions lessons/4-ComputerVision/06-IntroCV/OpenCV.ipynb

Large diffs are not rendered by default.

99 changes: 99 additions & 0 deletions lessons/4-ComputerVision/06-IntroCV/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,99 @@
# Introduction to Computer Vision

[Computer Vision](https://en.wikipedia.org/wiki/Computer_vision) is a discipline whose aim is to allow computers to gain high-level understanding of digital images. This is quite broad definition, because *understanding* can mean many different things, including finding object on the picture (**object detection**), understanding what is happening (**event detection**), describing picture in text, or 3D reconstruction of the scene. There are also special tasks related to human images: age/emotion estimation, face detection and identification, and 3D pose estimation, to name a few.

## [Pre-lecture quiz](https://black-ground-0cc93280f.1.azurestaticapps.net/quiz/106)

One of the simplest tasks of computer vision is **image classification**.

Computer vision is often considered to be a branch of AI. Nowadays, most of computer vision tasks are solved using neural networks. We will learn more about special type of neural networks for computer vision, [convolutional neural networks](../07-ConvNets/README.md), throughout this section.

However, before you pass the image to a neural network, in many cases it makes sense to use some algorithmic techniques to enhance the image.

There are several Python libraries available for image processing:
* **[imageio](https://imageio.readthedocs.io/en/stable/)** can be used for reading/writing different image formats. It also support video manipulation with ffmpeg.
* **[Pillow](https://pillow.readthedocs.io/en/stable/index.html)** (also known as PIL) is a bit more powerful, and also supports some image manipulations, such as morphing, pallette adjustments, etc.
* **[OpenCV](https://opencv.org/)** is a powerful image processing library written in C++, which became *de facto* standard for image processing. It has convenient Python interface.
* **[dlib](http://dlib.net/)** is a C++ library that implements many machine learning algorithms, including some of the Computer Vision algorithms. It also has Python interface, and can be used for challenging tasks such as face and facial landmark detection.

## OpenCV

[OpenCV](https://opencv.org/) is considered to be *de facto* standard for image processing. It contains a lot of useful algorithms, implemented in C++. You can call OpenCV from Python as well.

A good place to learn OpenCV is [this Learn OpenCV course](https://learnopencv.com/getting-started-with-opencv/). In our curriculum, our goal is not to learn OpenCV, but show you some examples when it can be used, and how.

### Loading Images

Images in Python can be conveniently represented by Numpy arrays. For example, grayscale image with size of 320x200 pixels would be stored in 200x320 array, and color image of the same dimension would have shape of 200x320x3 (for 3 color channels). To load an image, you can use the following code:

```python
import cv2
import matplotlib.pyplot as plt

im = cv2.imread('image.jpeg')
plt.imshow(im)
```

Traditionally, OpenCV uses BGR (Blue-Green-Red) encoding for color images, while the rest of Python tools use more traditional RGB. For the image to look right, you need to convert it to RGB color space, either by swapping dimensions in numpy array, or by calling OpenCV function:

```python
im = cv2.cvtColor(im,cv2.COLOR_BGR2RGB)
```

The same `cvtColor` function can be used to perform other color space transformations, eg. convert image to grayscale, or to HSV (Hue-Saturation-Value) color space.

You can also use OpenCV to load video frame-by-frame - an example is given in the exercise [OpenCV Notebook](OpenCV.ipynb).

### Image Processing

Before feeding an image to a neural network, you may want to apply several pre-processing steps. OpenCV can do many things, including:
* **Resizing** the image using `im = cv2.resize(im, (320,200),interpolation=cv2.INTER_LANCZOS)`
* **Blurring** the image using `im = cv2.medianBlur(im,3)` or `im = cv2.GaussianBlur(im, (3,3), 0)`
* Changing **brightness and contrast** of the image can be done by numpy array manipulations, as described [here](https://stackoverflow.com/questions/39308030/how-do-i-increase-the-contrast-of-an-image-in-python-opencv).
* Instead of adjusting brightness/contrast, it is often better to use [thresholding](https://docs.opencv.org/4.x/d7/d4d/tutorial_py_thresholding.html) by calling `cv2.threshold`/`cv2.adaptiveThreshold` functions.
* Applying different [transformations](https://docs.opencv.org/4.5.5/da/d6e/tutorial_py_geometric_transformations.html) to the image:
- **[Affine transformations](https://docs.opencv.org/4.5.5/d4/d61/tutorial_warp_affine.html)** can be useful if you need to combine rotation, resizing and skewing to the image, and you know source and destination location of three points in the image. Affine transformations keep parallel lines parallel.
- **[Perspective transformations](https://medium.com/analytics-vidhya/opencv-perspective-transformation-9edffefb2143)** can use useful when you known source and destination positions of 4 points in the image. For example, if you take a picture of a rectangular document via smartphone camera from some angle, and you want to make a rectangular image of the document itself.
* Understanding movement inside the image by using **[optical flow](https://docs.opencv.org/4.5.5/d4/dee/tutorial_optical_flow.html)**.
## Examples of using Computer Vision

In our [OpenCV Notebook](OpenCV.ipynb), we give some examples of when computer vision can be used to perform specific tasks:

* **Pre-processing a photograph of Braille book**. We focus on how we can use thresholding, feature detection, perspective transformation and numpy manipulations to separate individual Braille symbols for further classification by a neural network.

![Braille Image](data/braille.jpeg) | ![Braille Image Pre-processed](images/braille-result.png) | ![Braille Symbols](images/braille-symbols.png)

> *Image from [OpenCV.ipynb](OpenCV.ipynb)*
* **Detecting motion in video using frame difference**. If the camera is fixed, then frames from the camera should be pretty similar to each other. Since frames are represented as arrays, just by subtracting those arrays for two subsequent frames we will get the pixel difference, which should be low for static frames, and become higher once there is substantial motion in the image.

![Image of video frames and frame differences](images/frame-difference.png)

> *Image from [OpenCV.ipynb](OpenCV.ipynb)*
* **Detecting motion using Optical Flow**. [Optical flow](https://docs.opencv.org/3.4/d4/dee/tutorial_optical_flow.html) allows us to understand how individual pixels on video frames move. There are two types of optical flow:
- **Dense Optical Flow** computes the vector field that shows for each pixel where is it moving
- **Sparse Optical Flow** is based on taking some distinctive features in the image (eg. edges), and building their trajectory from frame to frame.

![Image of Optical Flow](images/optical.png)

> *Image from [OpenCV.ipynb](OpenCV.ipynb)*
Read more on optical flow [in this great tutorial](https://learnopencv.com/optical-flow-in-opencv/).

## ✍️ Exercises: try OpenCV in Action

Let's do some experiments with OpenCV by exploring [OpenCV Notebook](OpenCV.ipynb)

## [Post-lecture quiz](https://black-ground-0cc93280f.1.azurestaticapps.net/quiz/206)

## [Assignment](lab/README.md)

In this lab, you will take a video with simple gestures, and your goal would be to extract up/down/left/right movements using optical flow.

<img src="images/palm-movement.png" width="30%" alt="Palm Movement Frame"/>

## Takeaway

Sometimes, relatively complex tasks such as movement detection or fingertip detection can be solved purely by computer vision. Thus, it is very helpful to know basic techniques of computer vision, and what libraries like OpenCV can do.

Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file not shown.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
97 changes: 97 additions & 0 deletions lessons/4-ComputerVision/06-IntroCV/lab/MovementDetection.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,97 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Palm Movement Detection using Optical Flow\n",
"\n",
"This lab is part of [AI for Beginners Curriculum](http://aka.ms/ai-beginners).\n",
"\n",
"Consider [this video](palm-movement.mp4), in which a person's palm moves left/right/up/down on the stable background.\n",
"\n",
"<img src=\"../images/palm-movement.png\" width=\"30%\" alt=\"Palm Movement Frame\"/>\n",
"\n",
"**Your goal** would be to use Optical Flow to determine, which parts of video contain up/down/left/right movements. \n",
"\n",
"Start by getting video frames as described in the lecture:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Code here"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now, calculate dense optical flow frames as described in the lecture, and convert dense optical flow to polar coordinates: "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Code here"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Build histogram of directions for each of the optical flow frame. A histogram shows how many vectors fall under certain bin, and it should separate out different directions of movement on the frame.\n",
"\n",
"> You may also want to zero out all vectors whose magnitude is below certain threshold. This will get rid of small extra movements in the video, such as eyes and head.\n",
"\n",
"Plot the histograms for some of the frames."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Code here"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Looking at histograms, it should be pretty straightforward how to determine direction of movement. You need so select those bins the correspond to up/down/left/right directions, and that are above certain threshold."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Code here"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Congratulations! If you have done all steps above, you have completed the lab!"
]
}
],
"metadata": {
"language_info": {
"name": "python"
},
"orig_nbformat": 4
},
"nbformat": 4,
"nbformat_minor": 2
}
22 changes: 22 additions & 0 deletions lessons/4-ComputerVision/06-IntroCV/lab/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
# Detecting Movements using Optical Flow

Lab Assignment from [AI for Beginners Curriculum](https://aka.ms/ai-beginners).

## Task

Consider [this video](palm-movement.mp4), in which a person's palm moves left/right/up/down on the stable background.

<img src="../images/palm-movement.png" width="30%" alt="Palm Movement Frame"/>

**Your goal** would be to use Optical Flow to determine, which parts of video contain up/down/left/right movements.

**Stretch goal** would be to actually track the palm/finger movement using skin tone, as described [in this blog post](https://dev.to/amarlearning/finger-detection-and-tracking-using-opencv-and-python-586m) or [here](http://www.benmeline.com/finger-tracking-with-opencv-and-python/).


## Stating Notebook

Start the lab by opening [MovementDetection.ipynb](MovementDetection.ipynb)

## Takeaway

Sometimes, relatively complex tasks such as movement detection or fingertip detection can be solved purely by computer vision. Thus, it is very helpful to know what libraries like OpenCV can do.
Binary file not shown.
4 changes: 2 additions & 2 deletions lessons/4-ComputerVision/07-ConvNets/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,14 +11,14 @@ To extract patterns, we will use the notion of **convolutional filters**. As you
![Vertical Edge Filter](images/filter-vert.png) | ![Horizontal Edge Filter](images/filter-horiz.png)
----|----

> TODO image attribution
> Image by Dmitry Soshnikov
For example, if we apply 3x3 vertical edge and horizontal edge filters to the MNIST digits, we can get highlights (e.g. high values) where there are vertical and horizontal edges in our original image. Thus those two filters can be used to "look for" edges. Similarly, we can design different filters to look for other low-level patterns:

<img src="images/lmfilters.jpg" width="500" align="center"/>


> Image by the [Leung-Malik Filter Bank](https://www.robots.ox.ac.uk/~vgg/research/texclass/filters.html)
> Image of [Leung-Malik Filter Bank](https://www.robots.ox.ac.uk/~vgg/research/texclass/filters.html)
However, while we can design the filters to extract some patterns manually, we can also design the network in such a way that it will learn the patterns automatically. It is one of the main ideas behind the CNN.

Expand Down
4 changes: 2 additions & 2 deletions lessons/4-ComputerVision/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,10 +4,10 @@

In this section we will learn about:

* Intro to Computer Vision and OpenCV: coming soon
* [Intro to Computer Vision and OpenCV](06-IntroCV/README.md)
* [Convolutional Neural Networks](07-ConvNets/README.md)
* [Pre-trained Networks and Transfer Learning](08-TransferLearning/README.md)
* [Autoencoders](09-Autoencoders/README.md)
* [Generative Adversarial Networks](10-GANs/README.md)
* Object Detection: coming soon
* [Object Detection](11-ObjectDetection/README.md)
* [Semantic Segmentation](12-Segmentation/README.md)

0 comments on commit fa66a6b

Please sign in to comment.