What's New : Computer Vision : Spring 2024

motoole2 commented on slide_001 of Course Introduction (12 months ago)

The website is using mathjax to display LaTeX, which is supposed to work in all browsers. It takes a few seconds on my own machine to display properly, so it may be a matter of just waiting a bit. If it doesn't display properly, you could also try refreshing or switching browsers.

$\int \text{test}^{test}$

motoole2 commented on slide_028 of Image Alignment (12 months ago)

In assignment 6, we are tracking an object throughout a video sequence. Given the position of the object in the first frame, we assume that the object is approximately in the same location in the next frame and use this to initialize the image alignment procedure. That way, for every frame, we are just making small adjustments to $p$.

Another approach could be to (i) use a feature-based detector to warp an image, similar to assignment 2, and (ii) use this image alignment to refine the solution.

mjc commented on slide_001 of Course Introduction (12 months ago)

Something I just realized is that LaTeX is actually supposed to load properly, I just assumed it never worked.

mjc commented on slide_028 of Image Alignment (12 months ago)

We start with zero-initialized parameters in the assignment, so I am wondering how would have a good initial guess in a proper scenario?

Shdnfgsmeud commented on slide_001 of Course Introduction (12 months ago)

Hi!

motoole2 commented on slide_047 of Introduction to neural networks (about a year ago)

Earlier in this lecture, we introduced the perceptron, which worked by fitting a hyperplane through some N-dimensional space and labelling all points on one side +1 and all points on the other -1. The weights determined the orientation of the hyperplane. However, because this perceptron didn't have a bias, the hyperplane always had to pass through the origin.

The bias term provides a way to translate this hyperplane, which is a helpful property for perceptrons to make decisions (similar to when we discussed support vector machines at the end of lecture 14).

motoole2 commented on slide_072 of Introduction to neural networks (about a year ago)

The gradient of a function points in the direction of steepest ascent. For example, in this slide, we could represent the landscape using a function $f(x,y)$. The derivative, given by $[df/dx, df/dy] = [u, v]$, represents a 2d direction that maximizes the change in the value of $f(x,y)$. In our case, we want to minimize the function, and choose to step in directions opposite of the gradient (i.e., in the steepest descent direction).

motoole2 commented on slide_110 of Introduction to neural networks (about a year ago)

In this case, we want to compute the partial derivative of the loss with respect to $w_3$ (which is not the same as $f_2$, to be clear). Therefore, the last partial derivative in the chain will be computed with respect to $w_3$.

motoole2 commented on slide_069 of Convolutional Neural Network (about a year ago)

Pooling in general is designed to reduce the size of the data. The idea behind using the max pooling operation is to capture the most important features from the previous layer, which is the main argument over simply averaging. That said, average pooling does have its place.

tpbui commented on slide_069 of Convolutional Neural Network (about a year ago)

Can you explain why "avg" is a poor choice?

tpbui commented on slide_110 of Introduction to neural networks (about a year ago)

Why is the partial derivative of a3 w.r.t. to w3 = f2? If I miss anything, please let me know the slides I can refer back to. Thank you!

tpbui commented on slide_072 of Introduction to neural networks (about a year ago)

What is the intuition behind moving in opposite direction of the gradient? From the next few slides, I understand that gradient means the difference in loss function per each change in one unit of weight. Since we want to minimize the loss function, we need to move in the opposite direction to cancel out the change.

tpbui commented on slide_047 of Introduction to neural networks (about a year ago)

Can you explain what bias terms are and why we need them in a neural networK?

motoole2 commented on slide_115 of Introduction to neural networks (about a year ago)

$a_2$ does depend on $w_2$, and $a_3$ does depend on $w_3$! See this slide for example.

The reason that the calculation for $dL / dw_1$ does not include a $d / dw_2$ or $d / dw_3$ term is because the value of $w_2$ and $w_3$ does not depend on $w_1$. Only the other terms shown in this chain depend on the value of $w_1$.

ThomasLKT commented on slide_115 of Introduction to neural networks (about a year ago)

Why is it that when we calculate partial with respect to w1, we said a1 depends on w1, but a2 doesn’t depend on w2 and a3 doesn’t depend on w3?

motoole2 commented on slide_070 of Two-view Geometry (about a year ago)

Recall that a rotation matrix $R$ is unitary, which means that its inverse is $R^T$. Thus, $x' = R(x-t) \rightarrow R^T x' = x-t \rightarrow x'^T R = (x-t)^T$.

panda commented on slide_070 of Two-view Geometry (about a year ago)

How do you get (x'^T)R = (x-t)^T from the first equation?

motoole2 commented on slide_060 of Stereo (about a year ago)

As explained in the slides that proceed this one, there are four possible solutions that involve a combination of (i) one of rotation matrices $\mathbf{R}_1$ or $\mathbf{R}_2$ and (ii) a translation vector $\pm \mathbf{t}$. (Note that, if the determinant of the rotation matrix is somehow -1, then the matrix needs to be negated.)

Now, to determine which of the four possible solutions is correct, one would triangulate points in all four cases. As depicted in this slide, the correct configuration will produce points in front of both cameras (it is not sufficient to check that points are in front of one camera only).

rbustama commented on slide_060 of Stereo (about a year ago)

Could you please reexplain how this may present some problems or trickiness moving forward? I remember it being mentioned in lecture, but upon reviewing the slides I have forgotten.

motoole2 commented on slide_083 of Geometric Camera Models (about a year ago)

Absolutely! For example,

Radiometric calibration is used for high-dynamic range (HDR) imaging
Color calibration is done to do white balancing whenever you take a photo
Geometric calibration is required for computing geometry of scenes (e.g., stereo imaging).
Noise calibration is used to evaluate the imaging capabilities of sensors
Lens/aberration calibration is particularly important when computing panoramas, where distortion compensation plays an important role

We'll cover some of these topics later in the semester as well.

lululucyyyyyyy commented on slide_083 of Geometric Camera Models (about a year ago)

Are these methods used in cameras today? ie. cellphone cameras or slr cameras

motoole2 commented on slide_056 of Image Homographies (about a year ago)

Assignment 2 discusses how to perform SVDs in practice; refer to the handout for details. In short though, we can make use of the following function: numpy.linalg.svd, which takes a matrix as input and output a matrix $\mathbf{U}$, a vector $\mathbf{S}$, and a matrix $\mathbf{Vh}$. The columns of matrix $\mathbf{U}$ represent the left singular vectors, the rows of matrix $\mathbf{Vh}$ represent the right singular vectors, and the elements of vector $\mathbf{S}$ represent the singular values. By convention, the singular values are ordered such that $S_{i} \geq S_{i+1}$.

adlibs commented on slide_056 of Image Homographies (about a year ago)

How exactly do we compute SVD? And how do we identify the singular vector of the smallest singular value?

motoole2 commented on slide_053 of Detecting Corners (about a year ago)

The window function serves to compute the (weighted) sum of pixel differences across some finite neighborhood. Note here that, technically, the values for $x$ and $y$ can span $-\infty$ to $\infty$. So we definite a window that limits the size of the neighborhood of pixels that we will be summing over.

The Gaussian-weighted version provides more emphasis to pixels at the center of the neighborhood. It's up to you, however, to choose between a binary window, a Gaussian one, or a completely different window function.

tpbui commented on slide_053 of Detecting Corners (about a year ago)

I am not getting what the window function does. And what is the difference for the output of the (1,0) and Gaussian functions?

motoole2 commented on slide_054 of Image Homographies (about a year ago)

Ack, thanks for pointing this out. Please do download the PDF to properly view this slide HERE. (This happens from time to time in these uploaded slides, and I don't have a fix; not sure why this happens unfortunately.)

hiliang commented on slide_054 of Image Homographies (about a year ago)

FYI this slide image doesn't show the annotations for each element nor the sum equation, does show up correctly in the PDF though

motoole2 commented on slide_080 of Detecting Corners (about a year ago)

Step 3 performs a convolution with a Gaussian filter (with standard deviation $\sigma'$). Any given pixel in the image $S_{x^2}$ is therefore the weighted sum of pixels in a corresponding neighborhood in $I_{x^2}$. If you wanted to compute a straight sum (and not a weighted sum), you can replace the Gaussian filter with a box filter.

motoole2 commented on slide_043 of Detecting Corners (about a year ago)

Exactly. These plots are just for illustrative purposes, but in theory you would have 25 points, at locations given by the 25 x- and y- derivatives computed within the 5x5 window.

motoole2 commented on slide_049 of Detecting Corners (about a year ago)

This is a signal processing term for subtracting the mean. This way, the scatter plot is centered around 0.

adlibs commented on slide_080 of Detecting Corners (about a year ago)

What does $G_{\sigma'}$ mean here? How is it related to computing the sums of the products of derivatives at each pixel?

adlibs commented on slide_043 of Detecting Corners (about a year ago)

Are the points on the intensity chart obtained from applying a derivative filter and getting the resultant value for each pixel in the region?

adlibs commented on slide_049 of Detecting Corners (about a year ago)

What does DC offset mean?

motoole2 commented on slide_058 of Image pyramids and frequency domain (about a year ago)

This is the building block for any periodic signal. That is, any signal can be expressed as some linear combination of $A \sin(\omega x + \phi)$.

While the Fourier series itself is described as a combination of complex exponentials of the form $a e^{i\omega x}$, note that its real component, $\text{Re}(a e^{i\omega x})$, can be expressed as $A \sin(\omega x + \phi)$, where the values of $A$ and $\phi$ depend on the complex number $a$.

hello commented on slide_058 of Image pyramids and frequency domain (about a year ago)

Is this the basic building block for any sinusoid function, or for just a Fourier series?

motoole2 commented on slide_078 of Image pyramids and frequency domain (about a year ago)

For images, this would correspond to the average pixel value (sum up all pixels then divide by the number of pixels).

motoole2 commented on slide_048 of Image pyramids and frequency domain (about a year ago)

The short answer is that we perform some type of interpolation, i.e., we insert new columns and rows with values determined by its neighbors. But this does need to be done somewhat carefully.

For example, suppose that your image consisted of discrete points representing a continuous sinusoid. If you want to upsample that image, one idea might be to (i) fit a continuous sinusoidal signal to the discrete points, and (ii) use a higher number of discrete samples to represent that same sinusoid. This makes a critical assumption that the signal was not aliased (frequency of the sinusoid is no larger than half of the number of samples used to represent said signal).

When it comes to more general signals, the same idea applies. Provided that the frequency content of the original image is not too high, we can reconstruct that signal exactly by fitting linear combinations of sinusoids to the data.

Note that there are other ways to upsample the data, e.g., through linear interpolation. This, however, is not necessarily going to provide a perfect inverse to the downsampling operation.

adlibs commented on slide_078 of Image pyramids and frequency domain (about a year ago)

What is the signal average?

adlibs commented on slide_048 of Image pyramids and frequency domain (about a year ago)

How does upsampling work here? In other words, what is the algorithm used to transform $f_2$ to $l_1$?

motoole2 commented on slide_088 of Image pyramids and frequency domain (about a year ago)

Yes, this slide is showing the discrete inverse Fourier transform. (I could have made this more clear.) Regarding your second statement, yes, I would agree with that.

motoole2 commented on slide_039 of Image pyramids and frequency domain (about a year ago)

If the images are sufficiently blurred, subsampling will not result in an additional loss of information. When it comes to reconstructing this images (as described here), it is this reason why the subsampling operation is invertible. This may become more clear once we finish this lecture on Monday and discuss the Nyquist limit.

motoole2 commented on slide_103 of Image pyramids and frequency domain (about a year ago)

Even though the images do not appear to be periodic, they can be made periodic by creating an infinitely-large mosaic composed of the image. That way, any finite non-periodic signal can be turned into a infinitely large periodic signal.

The example in the interactive demo described in your post decomposes an 8x8 image into a linear combination of 64 = 8*8 basis functions. There are many basis functions one can potentially use to represent an image. In this particular case, the interactive demo shows a set of sinusoidal basis functions used in the Discrete Cosine Transform, which is used by JPEG for compression. There's a similar set of 64 basis functions for the actual Fourier Transform. Also note that this is not limited to 8x8 images; an $M\times M$ image can be represented as a linear combination of $M^2$ basis functions.

Regarding how to apply the Fourier transform on an image, you got the right idea. However, slide 87 makes use of 1D Fourier transforms. For images, we would want a 2D Fourier transform, that works with 2D functions. This is what it would look like: $$F(u,v) = \sum_{x=0}^{N-1} \sum_{y=0}^{M-1} f(x,y) e^{-j2\pi (ux + vy)}$$

mudilol commented on slide_001 of Course Introduction (about a year ago)

Hi everyone! I just joined the class :)

wiggl commented on slide_001 of Course Introduction (about a year ago)

hi

panda commented on slide_088 of Image pyramids and frequency domain (about a year ago)

Isn't the equation on this slide the inverse Fourier transform? Also, is it correct to say that applying the Fourier transform to the spatial domain of a periodic signal will give you the frequency domain, while applying the inverse Fourier transform will give you the opposite?