The derivation you have here makes sense, but the first expression you have written does not seem to match up with the expression at the top of the slide, right? I only see the second and fourth terms you wrote in the original expression from the top of the slide.
Absolutely. The isotropic assumption means that I can rotate a surface about its normal, and it will produce the same BRDF response. In other words, $f(\theta_i, \phi_i, \theta_r, \phi_r)=f(\theta_i, \phi_i + a, \theta_r, \phi_r + a)$ for all values $a$. So by picking the value $a = -\phi_r$, I can reduce this function down to one of only 3 variables: $f(\theta_i, \phi_i, \theta_r, \phi_r)=f(\theta_i, \phi_i - \phi_r, \theta_r, 0)$ (since the fourth argument is always $0$).
With respect to why the pseudoinverse is a closed-form solution to the least squares problem, please see this slide from earlier in the semester.
Next, let me redefine $\nabla I \frac{\partial W}{\partial p}$ a little. Instead of being a $1\times 6$ vector for a specific $x$ value, let me reinterpret this as a $N\times 6$ matrix, where every row corresponds to a different $x$. This might make the math a little more clear.
We then have a matrix $A = \nabla I \frac{\partial W}{\partial p}$, a vector $x = \Delta p$, and a vector $b = T(x) - I(W(x;p))$. Then, using our closed-form solution to the least squares problem, $\Delta p = (A^T A)^{-1} A^T b$ where $A^T A = H = (\nabla I \frac{\partial W}{\partial p})^T (\nabla I \frac{\partial W}{\partial p})$, and $A^T b = (\nabla I \frac{\partial W}{\partial p})^T (T(x) - I(W(x;p))$.
The denominator of the first partial derivative and the numerator of the second partial derivative do in fact match up, because $x' = W(x;p)$. Note that the first partial derivative represents the derivative of a warped image (hence why we use $x'$ instead of $x$ here).
And just to clarify, $\bar{u}_{kl}$ is defined as the local average (see bottom of slide).
It is a little confusing---I agree. Let me try to clarify.
So first of all, let's consider taking the derivative with respect to $u_{k,l}$, and do this in two steps. First, let's expand the summation to only include terms that include $u_{k,l}$, since all other terms will be zeroed out anyways. We get the following:
$\frac{1}{4}(u_{k,l} - u_{k-1,l})^2 + \frac{1}{4}(u_{k,l} - u_{k+1,l})^2 + \frac{1}{4}(u_{k,l} - u_{k,l-1})^2 + \frac{1}{4}(u_{k,l} - u_{k,l+1})^2 + \cdots$
$\quad\quad\lambda(I_x u_{k,l} + I_y u_{k,l} + I_t)^2$
Second, let's take the derivative with respect to $u_{k,l}$:
$\frac{1}{2}(u_{k,l} - u_{k-1,l}) + \frac{1}{2}(u_{k,l} - u_{k+1,l}) + \frac{1}{2}(u_{k,l} - u_{k,l-1}) + \frac{1}{2}(u_{k,l} - u_{k,l+1}) + \cdots$
$\quad\quad2\lambda(I_x u_{k,l} + I_y u_{k,l} + I_t) I_x$
Simplifying, we get our solution:
$2(u_{k,l} - \frac{1}{4}(u_{k-1,l} + u_{k+1,l} + u_{k,l-1} + u_{k,l+1})) + 2\lambda(I_x u_{k,l} + I_y u_{k,l} + I_t) I_x$
Sure! In the previous slide, we derived an overdetermined linear system $Ax = b$ between the flow vector, $x = [u,v]$, and our temporal gradients $b$. The equation here solves for the least squares solution by (1) multiplying both sides by $A^T$, and (2) inverting the 2x2 matrix $A^T A$ to solve for $x$.
Why don't we want to use the SVD here? We absolutely could. However, the solution is a little more efficient that computing the SVD. And we also use the form of $A^T A$ to explain the connection to the Harris corner detector a little later in this lecture.
Could you explain more why the isotropic assumption means we only need the 3 inputs listed as the bottom of the slide, as compared to before when we needed all 4?
Could you explain the logic going from the expression at the top of the slide to the equations on the bottom? Not sure I understand why the top expression is optimized when the bottom equations are true.
I'm confused how the second term on the first line (with the $$\delta p$$) expands into the second term on the second line denoted by chain rule. Specifically, it seems that the denominator of the first partial derivative and the numerator of the second partial derivative do not match up, so I'm unsure how this is a use of the chain rule.
I don't get the relevance of u bar here, and how is $$2(u_{kl} - \bar{u}_{kl})$$
the derivative of the expanded square terms above? For the first part of the expression above, I'm confused why it's not $4u_{kl}$ since first, we seem to be adding 2 $u_{ij}^2$ terms together, and second I don't know what happened to the $-2(u_{i+1 j} + u_{i j+1})$ term.
Could you explain more where this matrix equation came from and why we don't want to use the standard SVD?
Radiant flux measures light in terms of Watts. To compute flux, we consider all photons incident on a finite patch (area > 0) and coming from a finite wedge of directions (solid angle > 0). For example, the wedge can represent the hemisphere of possible lighting directions falling onto a patch.
I am a little confused about how radiant flux relates to solid angle W. Thank you!
Note that the final line here should be $\theta \leftarrow \theta - \eta \frac{dL}{d\theta}$.
Sure, let's see if we can clear this up.
The x-axis represents the space of possible random observations. In this case, the variable $x$ can take on one of 10 possible values (corresponding to the 10 bins shown here). While the probability density function is discrete in this case, it can just as easily be a continuous function.
The goal in this slide is to generate samples according to the specified PDF. This involves:
computing the cumulative density function, where the k-th bin is the sum of the first k bins from the PDF; and
sampling the inverse $\text{CDF}^{-1}(y)$ with uniformly random values $y \in [0,1]$. The resulting samples will have a distribution equal to the original PDF.
I am a bit confused about where the CDF comes from and what the horizontal axis represents here. Thank you!
$\Delta p$ is not a parameter of function $I$ (which is just a 2D array of pixel values)! It is a parameter to the parametric function $W$ though.
Are we saying function I is non-parametric in terms of delta p here? If so, why is it the case? Isn't delta p a parameter of the function?
It could be either inside or outside of the summation (numerically identical), but I agree it would be clearer if it were on the outside.
Why is $H^{-1}$ inside the summation for expression of $\Delta p$?
On the third line, we incorrectly state that
$\Delta p = \sum H^{-1} \left(\nabla T \frac{\partial W}{\partial p}\right)^T (T(x) - I(W(x; p)))$
It should be
$\Delta p = H^{-1} \sum \left(\nabla T \frac{\partial W}{\partial p} \right)^T (T(W(x;0)) - I(W(x; p)))$
If we're setting the right camera $O'$ as the origin then $O$ is at -b. Then the point $X$ would be at $-b + X$ or $X-b$ as you said. However, using this coordinate system, we have that the point $x'$ is located at $-x'$ since it is also to the left of the origin $O'$. Then there is a mismatch between the signs of the numerators. What's wrong with my logic here?
The goal of step 1 is to find a rotation matrix $R$ such that the image planes of both cameras are parallel; as a result, we only need to apply this rotation matrix $R$ to one image.
The goal of steps 2 and 3 is to find a second rotation matrix $R_{\text{rect}}$ to reorient the common image plane such that it is parallel to the stereo baseline / translation vector, making the epipolar lines horizontal.
You're absolutely right! There could very well be multiple instances of similar features on the epipolar line. This is a fundamental issue with stereo imaging, where there's no guarantee we can compute perfect correspondences (see this slide for examples on where stereo fails).
At the end of this lecture, we discussed a few strategies to improve correspondences. For example, we could use structured lighting to provide unique correspondences and get around this problem. An alternative solution is to enforce smoothness in the recovered depth map.
To be clear, I meant to say "if $F$ were rank 3"---and not "if $F$ were not rank 2".
If $F$ is rank 3 and given that $F$ is a $3\times 3$ matrix, the null space contains precisely one element (according to the rank-nullity theorem): the vector $0$. That is, the only vector $e$ where $e^T F = 0$ is $e = 0$.
To be fair, I could have been more clear. So let's try deriving this again from scratch!
Let's revisit slide 70. First, we derived the rigid motion equation $x' = R (x-t)$, which can also be rewritten as $x'^T R = (x-t)^T$. Second, because $t$, $x-t$, and $x$ are all coplanar, we also have $(x-t)^T(t\times x) = 0$. Combining these two equations together gives us:
$x'^T (R[t_{\times}]) x = x'^T E x = 0$
where
$E = R[t_{\times}]$
Now note that rotating three co-planar vectors (e.g., $t$, $x-t$, and $x$) produces another three co-planar vectors ($Rt$, $R(x-t)$, and $Rx$). Therefore, we can rewrite our coplanarity equation as follows: $(R(x-t))^T((Rt) \times (Rx)) = 0$. Combining this with the rigid motion equation produces the following:
$(R(x-t))^T((Rt) \times (Rx)) = 0$
$\rightarrow (RR^T x')^T((Rt) \times (Rx)) = 0$
$\rightarrow x'^T[(Rt)_x]Rx = 0$
$\rightarrow x'^T[t_x']Rx = 0$
where
$t' = Rt$ and $E = [t_x']R$
To answer your second question, because $R[t_{\times}] = [t_x']R$ where $t \neq t'$ in general, this slide and slide 73 are using different versions of $t$.
Why would we rotate the right image (Image 2) by R (and not the left image)? Isn't the relationship that x' = R(x-t), where x' is from Image 2? If we consider Image 1 to be in the "normal" orientation, then Image 2 is already a rotated version of Image 1. I'm confused why we would then want to rotate Image 2 again by R.
I'm rewatching this lecture so this question might be addressed later in the lecture, but is using a window and directly comparing pixel values really a good way of matching points in the images? I was thinking that there might be multiple instances of similar features which happen to be on the same line, and it would be difficult to distinguish which is the correct one. For example, even in the images on this slide, the line passes through the top right corner of the top right window, and since this and the actual correct point are both window corners they might have similar pixel intensity distributions.
Could you elaborate a little more on the first statement "if $F$ were not rank 2, then this would mean that there is no non-trivial point $e$ where $e^T F = 0$"? Why is this true? Is it because of the rank-nullity theorem?
Sorry, I'm still confused. If we substitute t' = R^T t into E = R[t'], wouldn't we just get E = t since R is unitary and so R*R^T will be the identity matrix?
I'm also still confused about how, even if we could decompose the matrix in two different ways as you stated, we end up with the statement R[t_x] = [t_x]R -- unless the previous slide I was referencing (slide 73 I believe) had an error and should have said R[t'_x]. Basically I'm asking if slide 73 and this slide are referencing the same [t_x] matrix, or if slide 73 was referencing [t'_x].
If $F$ were not rank 2, then this would mean that there is no non-trivial point $e$ where $e^T F = 0$. In other words, there would be no epipole! And therefore $F$ would not a valid fundamental matrix (based on our understanding of epipolar geometry).
Note that when running the 8-point algorithm with more than 8 points, we will in general get a fundamental matrix $F'$ that has rank 3 (due to noise in the correspondences). The process of setting the smallest singular value to 0 produces the closest rank 2 (valid) matrix $F$, where "closest" means that the Euclidean or the Frobenius distance between the two matrices $F$ and $F'$ is minimized (see this slide for more details).
Although there are 9 unknowns, the fundamental matrix $F$ (and the essential matrix $E$) are only unique up to a scale factor. To see this, I can scale the values of $F$ by an arbitrary scalar $\alpha \neq 0$, and show that if
$(x'^T F x) = 0$
then
$x'^T (\alpha F) x = \alpha (x'^T F x) = 0 $
In other words, this constraint on epipolar geometry is unaffected by the scalar $\alpha$. (Remember that $x'$ and $x$ are still homogeneous coordinates too, and their scale also has no affect on the 2D points that they represent.)
So while there are 9 unknowns, there are really only 8 degrees of freedom. And we need 8 correspondences to solve for the fundamental matrix.
You have to consider all terms with respect to all values $i$ and $j$ (i.e., given $N$ values for $i$ and $j$, the equation in the slide has a total of $4N$ smoothness terms, or $4$ for every value of $i$ and $j$). The first equation that I wrote here contains only the terms that include $u_{k,l}$. The second and fourth terms appear for $i = k$ and $j = l$. The first term appears for $i = k-1$ and $j = l$. The third term appears for $i = k$ and $j = l-1$.