Note: A lot of the notes had a significant number of formulas, and were done on paper. Those marked as TODO will likely not be updated. The 'Main Points' section however should include points across all lectures after the midterm.
- Optical axis - Z axis
- Applications of Computer Vision
- Image editing
- Object scanning for 3D printing
- Handwriting recognition
- Face detection and recognition
- Image search & retrieval
- People detection and tracking
- Navigation systems
- Medical imaging
- Biological imaging
- 720p = 1MP; 1080p = 2MP
- #f0f magenta; #ff0 yellow; #0ff cyan
- The human eye has varying sensitivity to RGB at varying wavelengths
- We are more sensitive to green
- Bayer pattern - checkerboard pattern of [G, R; B, G]
- 50% G, 25% R, 25% B
- IRGB(x) = CRGB(λ) S(x, λ) dλ
- ExposureRGB(x) = IRGB(x) * t
- Analog to digital conversion
- gain control - adjust overall sensitivity of responses
- white balance - adjust relative sensitivity of R, G, B
- Image noise - random variation of brightness or color information; usually electronic noise
- Image smoothing - averaging the intensities of RGBs to reduce noise
- Local average (1D) - Ismooth(x) = ¼ I(x + 1) + ½ I(x) + ¼ I(x - 1)
- Smoothing an edge will decrease the slope of steps, which blends the edge colors.
- Gaussian function: G(x) = (1/ sqrt(2 π) σ) e^-(x^2/(2 σ^2))
- Integrates to 1
- Often called normal distribution
- Can use it for smoothing weights
- The bigger the σ, the more smoothing there is
- Local difference: dI(x)/dx m ≈ ½ I(x + 1) - ½ I(x - 1)
- Cross correlation: f(x) cross I(x) ≡ Σu f(u - x) I(u)
- Sliding a template across an image, and taking the inner product
- Shows how well a filter matches the image
- For the local average (1D)
- f(x) = ¼ when x = +1, -1
- f(x) = ½ when x = 0
- f(x) = 0 otherwise
- I is the image, f is the filter or kernel
- Convolution: f(x) * I(x) ≡ Σu f(x - u) I(u)
- Summing weighted, shifted versions of the function f
- Shows how much image intensity contributes to the filtered output
- Impulse function: σ(x) = 1 when x = 0, 0 otherwise
- σ(x) * I(x) = I(x)
- Outputs an impulse response
- Motivation for edge detection
- Find boundaries of objects
- Match features in images
- Edge detection based on threshold
- | I(x + 1) - I(x - 1) | > τ
where τ is the threshold - Can also use partial derivatives to account for changes in both x and y
- Prewitt used gradients [1, 0, -1; 1, 0, -1; 1, 0, -1] and [-1, -1, -1; 0, 0, 0; 1, 1, 1]
- Sobel used gradients [1, 0, -1; 2, 0, -2; 1, 0, -1] and [-1, -2, -1; 0, 0, 0; 1, 2, 1]
- | I(x + 1) - I(x - 1) | > τ
- Canny edge detection
- Mark pixel as candidate edge if gradient magnitude is greater than some threshold τhigh
- Non-maximum suppression - compare gradient magnitude with neighbours nearest to the direction of that gradient. Eliminate edge pixels if they are not a local maximum
- Hysteresis thresholding
- Inliers - points explained by line model
- Outliers - points not explained by line model
- Reduce sensitivity to outliers by reducing penalty for large distances
- Eg distance rather than distance squared
- Eg constant penalty after certain distance
- Hough transform
- Lines can be represented with some positive value r (from origin), at some angle θ between 0 and 2π. The line is perpendicular to the line at angle θ, crossing the line r away from origin.
- For Hough transform, for every θ in [0, 2π), compute r for line in direction θ through point (xi, yi). Vote for this (r, θ), and return the (r, θ) with the maximum count
- To detect peaks, we can smooth the voting map to remove noise
- RANSAC - RANdom SAmple Consensus - reduces sensitivity
- Pick two different points, and fit a line
- Good model if many other points lie near the line
- We can find consensus set by counting the number of remaining points that are within τ from the line
- Do this many times and find the best line model (highest consensus size)
- Take best model, find consensus, then refit using least square method
- Can be scaled up to n points, where n is minimum # of points needed for exact model fit
- Why minimum # of points?
- Trade off is that more points lead to greater chance of including an outlier
- Pick two different points, and fit a line
- Vanishing points
- Recall
- Convolution: f(x, y) * I(x, y) = Σu, v f(x - u, y - v) I(u, v)
- Zero mean gaussian kernel: G(x, y, σ) = 1/(2πσ2) e-(x2 + y2)/(2σ2)
- Gaussian scale space - family of images smoothed by Gaussians of increasing σ
- Laplace's equation (aka heat equation or isotropic diffusion): ∂I(x, y, t)/∂t = ΔI(x, y, t) = ∂2I(x, y, t)/∂x2 + ∂2I(x, y, t)/∂y2
- Also equivalent to div(∇I(x, y, t))
- Persona-Malik introduces coefficient c(x, y, t), which is a montonically decreasing function of the magnitude of the image gradient
- ∂I(x, y, t)/∂t = div(c(x, y, t)∇I(x, y, t)) = c(x, y, t)ΔI(x, y, t) + ∇c(x, y, t)∇I(x, y, t)
- It is more efficient and faster to compute with coarser/smaller versions of an input image
- Gaussian pyramid - smoothing + subsampling
- Think of ∇I as 2 x 1 matrix (or a column vector)
- StructureTensor is a 2 x 2 matrix given by ∇I (∇I)T
- The eigenvalues & eigenvectors of matrix reveal invariant geometric properties
- Order the eigenvalues in descending order (λ1 ≥ λ2 > 0)
- Then e1 is the direction aligned with the local gradient
- Special case is when λ2 = 0, I is constant along e2
- If both are around 0, then no dominant gradient direction locally
- To find corners, find a way to capture locations where I(x0, y0) is different from that of its neighbours
- Keep in mind
- We always 'smooth' a little bit (convolution with a kernel) so that ∇I is acceptable
- I(x0 + Δx, y0 + Δy)
- Keep in mind
From lecture notes (not in class)
- Corner - region where two edges come together at sharp angle
- Intensities at a point are locally distinctive if the following "error" is large, relative to distance |(Δx, Δy)| to local neighbour
- Σ(x, y) ∈ N gd(x0, y0) (I(x0, y0) - I(x0 + Δx, y0 + Δy))2
- Second moment matrix (M)
- Σ(x, y) ∈ N gd(x0, y0) (∇I)(∇I)T = [Σ(∂I/∂x)2, Σ(∂I/∂x)(∂I/∂y) \\ Σ(∂I/∂x)(∂I/∂y), Σ(∂I/∂y)2]
- Harris corners
- Computing eigenvalues involves solving a quadratic equation, which is slow
- We may note that a matrix with eigenvalues λ1 and λ2 has a determinant of λ1λ2 and a trace of λ1 + λ2
- With a small constant k, λ1λ2 - k(λ1 + λ2)2:
- Is negative if one eigenvalue is 0
- Is small if both eigenvalues are small
- Trace and determinant can be used to find edges and corners
- Midterm material ends this week
- Normalized correlation - dcorrel(H1, H2) = Σ(H'1(i) · H'2) / √(Σ(H'12(i) · H'22(i)))
- 1 - perfect match
- -1 - perfect mismatch
- 0 - no correlation
- Chi-square - dchi-square(H1, H2) = Σi (H1(i) - H2(i))2 / (H1(i) + H2(i))
- 0 - perfect match
- ∞ - mismatch
- Bhattacharya
- 0 - perfect match
- 1 - total mismatch
- Histogram equalization
- Histogram of oriented gradients
- Compute gradients
- Divide into bins by orientation
- Count number of entries in each bin
- Dense computation with chosen cell size and block size
- SIFT - scale invariant feature transform
- Feature detector
- Features are local and sparse; invariant to translation, rotation, scale
- Matches feature vectors using correlation
- Translation is easy to handle; most extraction methods are invariant to translation
- Rotation is harder; needs canonical orientation
- Uses orientation at peak of smoothed histogram
- Sometimes there may be multiple peaks passed threshold; in that case, generate multiple keypoints for each orientation
- Scaling is even harder; uses scale-space theory
See Stanford's CS231 for more material
- Convolutional neural networks (CNN)
- Black box
- Lots of nodes computing weighted summation of inputs
- Weighted sums are sampled by activation function to create sparsity
- Sigmoid - f(z) = (1 + e-z)-1
- Hyperbolic tan - f(z) = tanh(z) = (ez - e-z)/(ez + e-z)
- RELU - f(z) = max(0, z)
- Softmax - given vector, transforms it so that each entry is in [0, 1] and all entries sum to 1
- Initial layer convolutional, later ones fully connected
- Black box
- Propagation
- Forward - writing down layer (l + 1)'s activations given layer l's activations then iterating layer through layer
- Backward - propagating gradients in reverse direction to minimize a loss function
Midterm next week until lecture 10
- CNNs
- Convolutional layers (MLPs)
- Activation functions, tanh ,sigmoid, RELV
- Pooling, avg, max, others
- Backward Propagation
- Gol is to estimate W and b to minimize sum of square error loss function
- Review properties of 2nd moment matrix
- RANSAC on midterm
- Image registration
- Given two similar images, for each (x0, y0), find (hx, hy) that minimizes the difference
- Assume that intensity variation is linear
- Windows need to be "appropriate"; smaller windows -> finer matches
- Deformation matrix
- [s, 0 \\ 0, s] - scales by s
- [cosθ, -sinθ \\ sinθ, cosθ] - CCW rotation by θ
Midterm
- TODO
- Beginning of 3D computer vision
- Left handed frame (Z goes into canvas vs out of canvas)
- To go from 3D to 2D, we can just drop the z coordinate (orthogonal view?)
- Projection view - take depth into account by drawing images such that each ray goes to a single center of projection
- Plane: aX + bY + cZ = d
- f/Z - scale factor in perspective projection
- Multiplying plane by f/Z, we can rewrite to ax + by + cf = fd/Z
- When Z → ∞, ax = by + cf = 0, line at infinity
- Ground plane
- Y = -h
- y = fY/Z = -fh/Z (when Z → ∞, y → 0)
- aka 'horizon'
- TODO
- Majority of notes done on paper. Below reflects some bigger points.
- Rotation: [x(t) \ y(t) \ z(t)] = [cos(Ωt) & 0 & sin(Ωt) \ 0 & 1 & 0 \ -sin(Ωt) & 0 & cos(Ωt)] = [x0 \ y0 \ z0]
- Rotation about y axis
- x(t) = f(X(t))/Z(t)
- y(t) = f(Y(t))/Z(t)
- Derive both where t = 0
- Vx = f(Ωx02 + Ωx02)/Z02
- Vy = fΩx0y0/z02 = Ωxy/f
- Rotation about x axis
- By symmetry, exchange roles of x and y
- Vx = Ωxy/f
- Vy = fΩ(1 + (y/f)2)
- Rotation about z axis
- Vx = Ω - y
- Vy = Ω x
- Discrete rotation
- [R] [x0 \ y0 \ z0] = [xr \ yr \ zr]
- R satisfies:
- det(R) = 1
- RRT = RTR = I3x3
- Ie orthogonal matrix, RT = R-1
- Rotation matrix preserves angle between 2 vectors
- Rotation matrix has 3 eigenvalues, 2 of which are complex, while the third is 1
- RP = λP
- If you choose the third eigenvalue and its associated eigenvector, then RP = P; P is the axis of rotation
- To simplify cross product, let [a]xb = a x b = [0 & -a3 & a2 \ a3 & 0 & a1 \ a2 & -a1 & 0]
- To map 3D point to 4D point, [x \ y \ z] → [x \ y \ z \ 1]
- Translation [x + tx \ y + ty \ z + tz \ 1] = [1 & 0 & 0 & tx \ 0 & 1 & 0 & ty \ 0 & 0 & 1 & tz \ 0 & 0 & 0 & 1] [x \ y \ z \ 1]
- Scaling [σxX \ σyY \ σzZ \ 1] = [σx & 0 & 0 & 0 \ 0 & σy & 0 & 0 \ 0 & 0 & σz & 0 \ 0 & 0 & 0 & 1][x \ y \ z \ 1]
- Rotation appears in the top left 3x3, with all other elements as 0, and cell 4, 4 = 1.
- 3D point (X, Y, Z) represented as 4D point (X, Y, Z, 1)
- In 4 x 4 matrix
- Translation: t attributes from rows 1 to 3 in column 4
- Rotation: top left 3x3 matrix
- Scaling: diagonal of top left 3x3 matrix
- TODO
- TODO
- Homograph - invertible 3 x 3 matrix that allows you to map pixels in on camera's image to pixels in another camera's image
- Case 1 - look at 3D scene plane
- Projection of location on 3D plane
- Case 2 - multiple cameras
- Case 3 - scene no longer planer, but camera undergoes pure rotation
- Case 1 - look at 3D scene plane
Case 1 (plane)
- We assume that (s, t) are 2D coordinates on scene plane
- Bottom left hand corner of plane has coordinates (x0, y0, z0) in a world coordinate frame
- Unit vectors in the directions of the axes respectively in world coordinates
- a = (ax, ay, az)
- b = (bx, by, bz)
- Coordinates of point (s, t) on plane in 3D
- (x0, y0, z0) + t = (x, y, z)
- t = sa + tb
- t = sax + say + sa z + tbx + tby + tbz
- To get pixel coordinates, pre-multiply P3 x 4
Case 2 (stereo (2) cameras)
- Rectification - transform (warp) both cameras' images such that their axes are in alignment
- T is written with respect to camera 1's coordinate frame
- Eg T = (1, 0, 0) implies that it is aligned with camera 1's x axis
- Rrect is a 3 x 3 matrix which, when used to pre-multiply T, rotates it so that you end up with an axis system for camera 1 where the X axis is aligned with T
- To align camera 2, use RrectR1R2-1
- To warp camera one - K1RrectK1-1
- To warp camera two - K1RrectR1R2-1K2-1
- T is written with respect to camera 1's coordinate frame
- 3D scene point X1 = (X1, Y1, Z1)T is coordinate w.r.t. camera 1
- Line of sight from one camera's optical center to the second camera's center intersects the image planes at the two epipoles, e1, e2
- A unique 3D scene point defines an epipolar plane π
- A unique 3D scenepoint, π, intersects the 2 camera image planes at conjugated epipolar lines
- X1 (above), along with T1 = (Tx, Ty, Tz)T and (X1 - T1)T lie in the epipolar plane
TODO
TODO
- Lambertian shading - surface whose reflected image obeys the equation:
- I(X) = N(X) · L
- Classical "shape from shading"
- I(x, y) = N(x, y) · L
- Too simplistic
- Most surfaces do not have Lambertian reflectance
- Cannot solve for single N without further constraints
- Hard to distinguish shading and shadows
- Scene colours depend on the illumination colour, though computer and human vision models typically normalize the image, assuming global colour bias
- Some non Lambertian surface materials
- Plastic
- Polished fruit
- Metal
- Wet surfaces
- Focal length = 1/field of view angle
- Mostly after lecture 11, registration onwards
- Should know about gaussian scale spaces, ransac, svd
- Histogram
- Bhattacharya
- Histogram equalization (why? how?)
- HoG (procedure)
- SIFT (procedure, comparison with HoG)
- Lucas Kanade
- RGB - cyan, magenta, yellow
- Camera models
- Pinhole, perspective, orthographic
- Optical axis, image/projection plane, image position, center of projection
- Motion
- Lateral, forward
- Vanishing points
- One, two, three point perspectives
- Rotation
- Pan, pitch, roll
- Matrices, velocity equations
- Pan, pitch, roll
- Homogeneous coordinates
- Camera extrinsics & intrinsics
- World coordinate to camera coordinate
-
Name and formula of matrix P
P = KR[I | -C]
Known as the "finite projective camera" -
Name for points at infinity
Direction vectors - What affects points at infinity? (translation, rotation, scale)
- Least Squares, SVD
- Linear regression
- Pseudoinverse of A
- SVD
- formula
- Estimating P using least squares
- Minimum number of point pairs needed
- Frobenius norm
- Why is getting the eigenvector from SVD not always a good solution
- Homographies
- Definition
- Scene plane to image pixels
- Image rectification
- Homography between 2 images
- All N points
- 4 points only
- RANSAC
- Stereo & epipolar geometry
- Essential matrix
- Fundamental matrix
- Eight point algorithm
- Binocular correspondence
- Rectification
- Disparity function
- Lucas-Kanade approach
- Global approach
- Data cost
- Smoothness cost
- Lighting & material, photography
- Lambertian shading
- Examples of non-Lambertian surface materials
- Lambertian shading