Lectures and Readings : Computer Vision : Fall 2023

Computer Vision (CMU 16-385)

This page contains lecture slides and recommended readings for the Fall 2022 offering of 16-385.

Lecture 1: Course Introduction

(Overview of computer vision)

Lecture 2: Image Filtering

(Image transformations, point image processing, linear shift-invariant image filtering, convolution, image gradients)

Basic reading:

Szeliski textbook, Section 3.2

Lecture 3: Image Pyramids and Frequency Domain

(Image downsampling, aliasing, Gaussian image pyramid, Laplacian image pyramid, Fourier series, frequency domain, Fourier transform, frequency-domain filtering, sampling)

Basic reading:

Szeliski textbook, Section 3.4, 3.5

Additional reading:

Burt and Adelson, "The Laplacian Pyramid as a Compact Image Code", IEEE ToC 1983. (The original Laplacian pyramid paper.)
Hubel and Wiesel, "Receptive fields, binocular interaction and functional architecture in the cat's visual cortex", The Journal of Physiology 1962. A foundational paper describing information processing in the visual system, including the different types of filtering it performs; Hubel and Wiesel won the Nobel Prize in Medicine in 1981 for the discoveries described in this paper.

Lecture 4: Hough Transform

(Finding boundaries, line fitting, line parameterization, Hough transform, Hough circles)

Basic reading:

Szeliski textbook, Section 7.4, A.2

Lecture 5: Detecting Corners

(Visualizing quadratics, Harris corner detector, multi-scale detection)

Basic reading:

Szeliski textbook, Section 7.1
The Singular Value Decomposition (from Numerical Linear Algebra by Trefethen and Bau). Note: The eigenvalues and eigenvectors of the covariance matrix (or any positive semidefinite matrix for that matter) are equivalent to its singular values and singular vectors.

Lecture 6: Feature Detectors and Descriptors

(Designing feature descriptors, MOPS descriptor, GIST descriptor, Histogram of Textons descriptor, HOG descriptor, SIFT)

Basic reading:

Szeliski textbook, Section 7.1

Lecture 7: 2D Transformations

(2D transformations, projective geometry, classification of 2D transformations, determining unknown 2D transformations)

Basic reading:

Szeliski textbook, Section 2.1

Additional reading:

Hartley and Zisserman, "Multiple View Geometry in Computer Vision", Cambridge University Press 2004. A comprehensive treatment of all aspects of projective geometry relating to computer vision, and also a very useful reference for the second part of the class.
Richter-Gebert, "Perspectives on projective geometry", Springer 2011. A beautiful, thorough, and very accessible mathematics textbook on projective geometry (available online for free from CMU's library).

Lecture 8: Image Homographies

(Panoramas, Image homographies, Computing with homographies, direct linear transform (DLT), random sample consensus (RANSAC))

Basic reading:

Szeliski textbook, Section 2.1

Additional reading:

Hartley and Zisserman, "Multiple View Geometry in Computer Vision", Cambridge University Press 2004. Sections 2 and 4 in particular discuss everything about homography estimation.

Lecture 9: Geometric Camera Models

(Pinhole camera, accidental pinholes, camera matrix)

Basic reading:

Szeliski textbook, Section 2.1

Additional reading:

Hartley and Zisserman, "Multiple View Geometry in Computer Vision", Cambridge University Press 2004. Chapter 6 of this book has a very thorough treatment of camera models.
Torralba and Freeman, "Accidental Pinhole and Pinspeck Cameras", CVPR 2012.

Lecture 10: Geometric Camera Models (cont.)

(Review of camera matrix, perspective, other camera models, pose estimation)

Basic reading:

Szeliski textbook, Section 2.1
Hartley and Zisserman textbook, Chapter 6.

Lecture 11: Two-View Geometry

(Triangulation, epipolar geometry, essential matrix, fundamental matrix, 8-point algorithm)

Basic reading:

Szeliski textbook, Section 11.2.4, 11.3.1, 11.3.2, 11.3.3
Hartley and Zisserman textbook, Section 11.12.

Lecture 12: Stereo

(Revisiting triangulation, disparity, stereo rectification, stereo matching, improved stereo matching)

Basic reading:

Szeliski textbook, Section 12.1, 12.5
Hartley and Zisserman textbook, Section 11.12.

Lecture 13 & 14: Image Classification

(Introduction to learning-based vision, image classification, bag-of-words, K-means clustering, classification, K-nearest neighbors, naive Bayes, support vector machines)

Basic reading:

Szeliski textbook, Chapter 6.2

Lecture 15 & 16: Neural Networks

(Perceptron, neural networks, training perceptrons, gradient descent, backpropagation, stochastic gradient descent)

Basic reading (No standard textbooks yet!):

Lecture 17: Convolutional Neural Networks

(Some notes on optimization, convolutional neural networks, training ConvNets)

Basic reading (No standard textbooks yet!):

Lecture 18: Optical Flow

(Intro to vision for video, optical flow, constant flow, Horn-Schunck flow)

Basic reading:

Szeliski textbook, Section 8.4

Lecture 19 & 20: Alignment and Tracking

(Motion magnification using optical flow, image alignment, Lucas-Kanade alignment, Baker-Matthews alignment, inverse alignment, KLT tracking, mean-shift tracking, modern trackers)

Basic reading:

Szeliski textbook, Section 4.1.1, 5.3, 8.1

Lecture 21 & 22: Radiometry and Reflectance

(Appearance phenomena, measuring light and radiometry, reflectance and BRDF)

Basic reading:

Szeliski textbook, Section 2.2
Steven Gortler, Foundations of Computer Graphics, Chapter 21. This book has a great introduction to radiometry, reflectance, and their use for image formation.

Lecture 23: Photometric Stereo

(Notes about radiometry, the n-dot-l model, photometric stereo, uncalibrated photometric stereo, generalized bas-relief ambiguity, shape from shading)

Basic reading:

Szeliski textbook, Section 2.2
Steven Gortler, Foundations of Computer Graphics, Chapter 21. This book has a great introduction to radiometry, reflectance, and their use for image formation.

Lecture 24 & 25: Digital Photography

(Imaging sensor primer, color sensing in cameras, in-camera image processing pipeline, radiometric calibration)

Basic reading:

Szeliski textbook, Section 2.3
Michael Brown, "Understanding the In-Camera Image Processing Pipeline for Computer Vision," CVPR 2016, very detailed discussion of issues relating to color photography and management, slides available here.
Nine Degrees Below: amazing resource for color photography, reproduction, and management.

Lecture 26: Wrap-up