What's New : Computer Vision : Spring 2022

motoole2 commented on slide_093 of Optical Flow (3 years ago)

You have to consider all terms with respect to all values $i$ and $j$ (i.e., given $N$ values for $i$ and $j$ , the equation in the slide has a total of $4N$ smoothness terms, or $4$ for every value of $i$ and $j$ ). The first equation that I wrote here contains only the terms that include $u_{k,l}$ . The second and fourth terms appear for $i = k$ and $j = l$ . The first term appears for $i = k-1$ and $j = l$ . The third term appears for $i = k$ and $j = l-1$ .

nssampat commented on slide_093 of Optical Flow (3 years ago)

The derivation you have here makes sense, but the first expression you have written does not seem to match up with the expression at the top of the slide, right? I only see the second and fourth terms you wrote in the original expression from the top of the slide.

motoole2 commented on slide_064 of Radiometry and Reflectance (3 years ago)

Absolutely. The isotropic assumption means that I can rotate a surface about its normal, and it will produce the same BRDF response. In other words, $f(\theta_i, \phi_i, \theta_r, \phi_r)=f(\theta_i, \phi_i + a, \theta_r, \phi_r + a)$ for all values $a$. So by picking the value $a = -\phi_r$, I can reduce this function down to one of only 3 variables: $f(\theta_i, \phi_i, \theta_r, \phi_r)=f(\theta_i, \phi_i - \phi_r, \theta_r, 0)$ (since the fourth argument is always $0$).

motoole2 commented on slide_051 of Alignment and Tracking (3 years ago)

With respect to why the pseudoinverse is a closed-form solution to the least squares problem, please see this slide from earlier in the semester.

Next, let me redefine $\nabla I \frac{\partial W}{\partial p}$ a little. Instead of being a $1\times 6$ vector for a specific $x$ value, let me reinterpret this as a $N\times 6$ matrix, where every row corresponds to a different $x$. This might make the math a little more clear.

We then have a matrix $A = \nabla I \frac{\partial W}{\partial p}$, a vector $x = \Delta p$, and a vector $b = T(x) - I(W(x;p))$. Then, using our closed-form solution to the least squares problem, $\Delta p = (A^T A)^{-1} A^T b$ where $A^T A = H = (\nabla I \frac{\partial W}{\partial p})^T (\nabla I \frac{\partial W}{\partial p})$, and $A^T b = (\nabla I \frac{\partial W}{\partial p})^T (T(x) - I(W(x;p))$.

motoole2 commented on slide_032 of Alignment and Tracking (3 years ago)

The denominator of the first partial derivative and the numerator of the second partial derivative do in fact match up, because $x' = W(x;p)$. Note that the first partial derivative represents the derivative of a warped image (hence why we use $x'$ instead of $x$ here).

motoole2 commented on slide_093 of Optical Flow (3 years ago)

And just to clarify, $\bar{u}_{kl}$ is defined as the local average (see bottom of slide).

motoole2 commented on slide_093 of Optical Flow (3 years ago)

It is a little confusing---I agree. Let me try to clarify.

So first of all, let's consider taking the derivative with respect to $u_{k,l}$, and do this in two steps. First, let's expand the summation to only include terms that include $u_{k,l}$, since all other terms will be zeroed out anyways. We get the following:

$\frac{1}{4}(u_{k,l} - u_{k-1,l})^2 + \frac{1}{4}(u_{k,l} - u_{k+1,l})^2 + \frac{1}{4}(u_{k,l} - u_{k,l-1})^2 + \frac{1}{4}(u_{k,l} - u_{k,l+1})^2 + \cdots$

$\quad\quad\lambda(I_x u_{k,l} + I_y u_{k,l} + I_t)^2$

Second, let's take the derivative with respect to $u_{k,l}$:

$\frac{1}{2}(u_{k,l} - u_{k-1,l}) + \frac{1}{2}(u_{k,l} - u_{k+1,l}) + \frac{1}{2}(u_{k,l} - u_{k,l-1}) + \frac{1}{2}(u_{k,l} - u_{k,l+1}) + \cdots$

$\quad\quad2\lambda(I_x u_{k,l} + I_y u_{k,l} + I_t) I_x$

Simplifying, we get our solution:

$2(u_{k,l} - \frac{1}{4}(u_{k-1,l} + u_{k+1,l} + u_{k,l-1} + u_{k,l+1})) + 2\lambda(I_x u_{k,l} + I_y u_{k,l} + I_t) I_x$

motoole2 commented on slide_051 of Optical Flow (3 years ago)

Sure! In the previous slide, we derived an overdetermined linear system $Ax = b$ between the flow vector, $x = [u,v]$, and our temporal gradients $b$. The equation here solves for the least squares solution by (1) multiplying both sides by $A^T$, and (2) inverting the 2x2 matrix $A^T A$ to solve for $x$.

Why don't we want to use the SVD here? We absolutely could. However, the solution is a little more efficient that computing the SVD. And we also use the form of $A^T A$ to explain the connection to the Harris corner detector a little later in this lecture.

nssampat commented on slide_064 of Radiometry and Reflectance (3 years ago)

Could you explain more why the isotropic assumption means we only need the 3 inputs listed as the bottom of the slide, as compared to before when we needed all 4?

nssampat commented on slide_051 of Alignment and Tracking (3 years ago)

Could you explain the logic going from the expression at the top of the slide to the equations on the bottom? Not sure I understand why the top expression is optimized when the bottom equations are true.

nssampat commented on slide_032 of Alignment and Tracking (3 years ago)

I'm confused how the second term on the first line (with the $$\delta p$$) expands into the second term on the second line denoted by chain rule. Specifically, it seems that the denominator of the first partial derivative and the numerator of the second partial derivative do not match up, so I'm unsure how this is a use of the chain rule.

nssampat commented on slide_093 of Optical Flow (3 years ago)

I don't get the relevance of u bar here, and how is $$2(u_{kl} - \bar{u}_{kl})$$

the derivative of the expanded square terms above? For the first part of the expression above, I'm confused why it's not $4u_{kl}$ since first, we seem to be adding 2 $u_{ij}^2$ terms together, and second I don't know what happened to the $-2(u_{i+1 j} + u_{i j+1})$ term.

nssampat commented on slide_051 of Optical Flow (3 years ago)

Could you explain more where this matrix equation came from and why we don't want to use the standard SVD?

motoole2 commented on slide_039 of Radiometry and Reflectance (3 years ago)

Radiant flux measures light in terms of Watts. To compute flux, we consider all photons incident on a finite patch (area > 0) and coming from a finite wedge of directions (solid angle > 0). For example, the wedge can represent the hemisphere of possible lighting directions falling onto a patch.

kyriaki commented on slide_039 of Radiometry and Reflectance (3 years ago)

I am a little confused about how radiant flux relates to solid angle W. Thank you!

motoole2 commented on slide_120 of Neural Networks (3 years ago)

Note that the final line here should be $\theta \leftarrow \theta - \eta \frac{dL}{d\theta}$.

motoole2 commented on slide_136 of Alignment and Tracking (3 years ago)

Sure, let's see if we can clear this up.

The x-axis represents the space of possible random observations. In this case, the variable $x$ can take on one of 10 possible values (corresponding to the 10 bins shown here). While the probability density function is discrete in this case, it can just as easily be a continuous function.

The goal in this slide is to generate samples according to the specified PDF. This involves:

computing the cumulative density function, where the k-th bin is the sum of the first k bins from the PDF; and
sampling the inverse $\text{CDF}^{-1}(y)$ with uniformly random values $y \in [0,1]$. The resulting samples will have a distribution equal to the original PDF.

kyriaki commented on slide_136 of Alignment and Tracking (3 years ago)

I am a bit confused about where the CDF comes from and what the horizontal axis represents here. Thank you!

motoole2 commented on slide_029 of Alignment and Tracking (3 years ago)

$\Delta p$ is not a parameter of function $I$ (which is just a 2D array of pixel values)! It is a parameter to the parametric function $W$ though.

kyriaki commented on slide_029 of Alignment and Tracking (3 years ago)

Are we saying function I is non-parametric in terms of delta p here? If so, why is it the case? Isn't delta p a parameter of the function?

motoole2 commented on slide_082 of Alignment and Tracking (3 years ago)

It could be either inside or outside of the summation (numerically identical), but I agree it would be clearer if it were on the outside.

jlimpray commented on slide_082 of Alignment and Tracking (3 years ago)

Why is $H^{-1}$ inside the summation for expression of $\Delta p$?

motoole2 commented on slide_082 of Alignment and Tracking (3 years ago)

On the third line, we incorrectly state that

$\Delta p = \sum H^{-1} \left(\nabla T \frac{\partial W}{\partial p}\right)^T (T(x) - I(W(x; p)))$

It should be

$\Delta p = H^{-1} \sum \left(\nabla T \frac{\partial W}{\partial p} \right)^T (T(W(x;0)) - I(W(x; p)))$

motoole2 commented on slide_049 of Stereo (4 years ago)

We could actually rotate either the right camera by $R$ or the left camera by $R^T$. Or, we can even rotate both cameras. So long as the final result has both cameras pointed in the same direction.

nssampat commented on slide_049 of Stereo (4 years ago)

I understand, but my question was more about why in step 1 we choose to rotate the right camera by R rather than the left camera.

motoole2 commented on slide_026 of Stereo (4 years ago)

The problem here is that the statement "the point $x'$ is located at $-x'$" is incorrect. The point $x'$ is located at $x'$. :-) The point $x'$ simply has negative value.

nssampat commented on slide_026 of Stereo (4 years ago)

If we're setting the right camera $O'$ as the origin then $O$ is at -b. Then the point $X$ would be at $-b + X$ or $X-b$ as you said. However, using this coordinate system, we have that the point $x'$ is located at $-x'$ since it is also to the left of the origin $O'$. Then there is a mismatch between the signs of the numerators. What's wrong with my logic here?

motoole2 commented on slide_049 of Stereo (4 years ago)

The goal of step 1 is to find a rotation matrix $R$ such that the image planes of both cameras are parallel; as a result, we only need to apply this rotation matrix $R$ to one image.

The goal of steps 2 and 3 is to find a second rotation matrix $R_{\text{rect}}$ to reorient the common image plane such that it is parallel to the stereo baseline / translation vector, making the epipolar lines horizontal.

motoole2 commented on slide_037 of Stereo (4 years ago)

Projecting content onto the same image plane does not make the epipolar lines horizontal. Once the images are on a common image plane, we require a second operation to rotate it such that the plane is parallel to the stereo baseline / translation vector.

motoole2 commented on slide_026 of Stereo (4 years ago)

The 3D point $X$ at a position $X-b$ with respect to the right camera. (Note that $X-b$ is a negative quantity here; if you simply care about length, then you can take its absolute value $|X-b|$.)

motoole2 commented on slide_008 of Stereo (4 years ago)

You're absolutely right! There could very well be multiple instances of similar features on the epipolar line. This is a fundamental issue with stereo imaging, where there's no guarantee we can compute perfect correspondences (see this slide for examples on where stereo fails).

At the end of this lecture, we discussed a few strategies to improve correspondences. For example, we could use structured lighting to provide unique correspondences and get around this problem. An alternative solution is to enforce smoothness in the recovered depth map.

motoole2 commented on slide_100 of Two-view Geometry (4 years ago)

To be clear, I meant to say "if $F$ were rank 3"---and not "if $F$ were not rank 2".

If $F$ is rank 3 and given that $F$ is a $3\times 3$ matrix, the null space contains precisely one element (according to the rank-nullity theorem): the vector $0$. That is, the only vector $e$ where $e^T F = 0$ is $e = 0$.

motoole2 commented on slide_087 of Two-view Geometry (4 years ago)

To be fair, I could have been more clear. So let's try deriving this again from scratch!

Let's revisit slide 70. First, we derived the rigid motion equation $x' = R (x-t)$, which can also be rewritten as $x'^T R = (x-t)^T$. Second, because $t$, $x-t$, and $x$ are all coplanar, we also have $(x-t)^T(t\times x) = 0$. Combining these two equations together gives us:

$x'^T (R[t_{\times}]) x = x'^T E x = 0$

where

$E = R[t_{\times}]$

Now note that rotating three co-planar vectors (e.g., $t$, $x-t$, and $x$) produces another three co-planar vectors ($Rt$, $R(x-t)$, and $Rx$). Therefore, we can rewrite our coplanarity equation as follows: $(R(x-t))^T((Rt) \times (Rx)) = 0$. Combining this with the rigid motion equation produces the following:

$(R(x-t))^T((Rt) \times (Rx)) = 0$

$\rightarrow (RR^T x')^T((Rt) \times (Rx)) = 0$

$\rightarrow x'^T[(Rt)_x]Rx = 0$

$\rightarrow x'^T[t_x']Rx = 0$

where

$t' = Rt$ and $E = [t_x']R$

To answer your second question, because $R[t_{\times}] = [t_x']R$ where $t \neq t'$ in general, this slide and slide 73 are using different versions of $t$.

nssampat commented on slide_049 of Stereo (4 years ago)

Why would we rotate the right image (Image 2) by R (and not the left image)? Isn't the relationship that x' = R(x-t), where x' is from Image 2? If we consider Image 1 to be in the "normal" orientation, then Image 2 is already a rotated version of Image 1. I'm confused why we would then want to rotate Image 2 again by R.

nssampat commented on slide_037 of Stereo (4 years ago)

I get why we want the images to lie in the same plane but could you explain how making the epipolar lines horizontal helps us accomplish that? I'm having some trouble visualizing it.

nssampat commented on slide_026 of Stereo (4 years ago)

Why is the numerator X - b and not b - X? It seems the length of the top part of the red triangle is the length of the baseline minus the length X, which was the horizontal distance between the point $X$ and the first camera center O.

nssampat commented on slide_008 of Stereo (4 years ago)

I'm rewatching this lecture so this question might be addressed later in the lecture, but is using a window and directly comparing pixel values really a good way of matching points in the images? I was thinking that there might be multiple instances of similar features which happen to be on the same line, and it would be difficult to distinguish which is the correct one. For example, even in the images on this slide, the line passes through the top right corner of the top right window, and since this and the actual correct point are both window corners they might have similar pixel intensity distributions.

nssampat commented on slide_100 of Two-view Geometry (4 years ago)

Could you elaborate a little more on the first statement "if $F$ were not rank 2, then this would mean that there is no non-trivial point $e$ where $e^T F = 0$"? Why is this true? Is it because of the rank-nullity theorem?

nssampat commented on slide_087 of Two-view Geometry (4 years ago)

Sorry, I'm still confused. If we substitute t' = R^T t into E = R[t'], wouldn't we just get E = t since R is unitary and so R*R^T will be the identity matrix?

I'm also still confused about how, even if we could decompose the matrix in two different ways as you stated, we end up with the statement R[t_x] = [t_x]R -- unless the previous slide I was referencing (slide 73 I believe) had an error and should have said R[t'_x]. Basically I'm asking if slide 73 and this slide are referencing the same [t_x] matrix, or if slide 73 was referencing [t'_x].

motoole2 commented on slide_100 of Two-view Geometry (4 years ago)

If $F$ were not rank 2, then this would mean that there is no non-trivial point $e$ where $e^T F = 0$. In other words, there would be no epipole! And therefore $F$ would not a valid fundamental matrix (based on our understanding of epipolar geometry).

Note that when running the 8-point algorithm with more than 8 points, we will in general get a fundamental matrix $F'$ that has rank 3 (due to noise in the correspondences). The process of setting the smallest singular value to 0 produces the closest rank 2 (valid) matrix $F$, where "closest" means that the Euclidean or the Frobenius distance between the two matrices $F$ and $F'$ is minimized (see this slide for more details).

motoole2 commented on slide_092 of Two-view Geometry (4 years ago)

Although there are 9 unknowns, the fundamental matrix $F$ (and the essential matrix $E$) are only unique up to a scale factor. To see this, I can scale the values of $F$ by an arbitrary scalar $\alpha \neq 0$, and show that if

$(x'^T F x) = 0$

then

$x'^T (\alpha F) x = \alpha (x'^T F x) = 0 $

In other words, this constraint on epipolar geometry is unaffected by the scalar $\alpha$. (Remember that $x'$ and $x$ are still homogeneous coordinates too, and their scale also has no affect on the 2D points that they represent.)

So while there are 9 unknowns, there are really only 8 degrees of freedom. And we need 8 correspondences to solve for the fundamental matrix.