Sorry I'm a bit confused about the notation change: in this case, is the matrix A the affine transform (or other linear transform function)? b is the set of measured points? x is the error?
Also, if we dont compute the inverse of the matrix, how do we solve for x? Do we like convert to orthonormal form using Gram-Schmidt and use the properties of an orthonormal matrix? Is this actually faster?
mpotoole
The answer to some of your questions are explained in this slide. In that slide, x is a 6-element vector representing the six matrix elements of an affine transform. A is a matrix that depends on the x- and y- coordinates of your points x. b is a vector containing the x- and y- coordinates of your corresponding points x'.
In short, the values of both A and b depend on your data, and your solving for x which represents your affine transform.
As for solving this system of equations, we compute the so-called pseudoinverse, which is the line shown at the bottom of this slide. It is a generalized inversion operation that deals with non-square matrices. It still makes use of a matrix inversion operation, but the point is that you're not taking the inverse of the matrix A directly.
gmoyer
Is it somehow less bad to compute the inverse of $A^TA$?
Samleo
I believe that there is a theorem which states that $A^T A$ is guaranteed to be invertible; A is not guaranteed to be invertible (eg. it can be not square). I'm not sure if the inverse of $A^TA$ comes out from the SVD automatically or something
mpotoole
$A^T A$ is invertible if and only if $A$ is full rank.
We typically work with the pseudoinverse when working with an overdetermined system, i.e., when $A$ has more rows than columns. When the matrix $A$ is square (and invertible), the pseudoinverse and inverse end up giving identical results.
gmoyer
But if A is full rank then isn't A invertible?
Or can A be full rank even if it's not square?
mpotoole
$A$ can be full rank and not square. When the matrix $A$ has more rows than columns, full rank simply refers to the columns being linearly independent. In this case, $A^T A$ becomes a square matrix of full rank. Similarly, when the matrix $A$ has more columns than rows, full rank means that the rows are linearly independent.
So in short, any square matrix is invertible if and only if it is full rank. A non-square matrix, however, is not invertible.
gmoyer
Oh okay thank you for clarifying!
gmoyer
I'm having a hard time recreating the expansion of the error function. I am getting:
$$x^TA^TAx - x^TA^Tb - b^TAx + ||b||^2$$
How does the $b^TAx$ term get grouped with the one before it to produce $-2x^TA^Tb$? Isn't $b^TAx$ the transpose of $x^TA^Tb$?
mpotoole
All of the terms listed in your expansion of the error function are scalars. And the transpose of a scalar produces the same scalar, i.e., $s^T = s$. This means that we can rewrite the expression as follows:
$x^TA^TAx - x^TA^Tb - (b^TAx)^T + ||b||^2$
And you can then simplify to get the equation written on the slide.
gmoyer
Oh nevermind! I think it's okay because both of those expressions would evaluate to a scalar so they are equal (transposing a scalar gives you that scalar).
Sorry I'm a bit confused about the notation change: in this case, is the matrix A the affine transform (or other linear transform function)? b is the set of measured points? x is the error?
Also, if we dont compute the inverse of the matrix, how do we solve for x? Do we like convert to orthonormal form using Gram-Schmidt and use the properties of an orthonormal matrix? Is this actually faster?
The answer to some of your questions are explained in this slide. In that slide, x is a 6-element vector representing the six matrix elements of an affine transform. A is a matrix that depends on the x- and y- coordinates of your points x. b is a vector containing the x- and y- coordinates of your corresponding points x'.
In short, the values of both A and b depend on your data, and your solving for x which represents your affine transform.
As for solving this system of equations, we compute the so-called pseudoinverse, which is the line shown at the bottom of this slide. It is a generalized inversion operation that deals with non-square matrices. It still makes use of a matrix inversion operation, but the point is that you're not taking the inverse of the matrix A directly.
Is it somehow less bad to compute the inverse of $A^TA$?
I believe that there is a theorem which states that $A^T A$ is guaranteed to be invertible; A is not guaranteed to be invertible (eg. it can be not square). I'm not sure if the inverse of $A^TA$ comes out from the SVD automatically or something
$A^T A$ is invertible if and only if $A$ is full rank.
We typically work with the pseudoinverse when working with an overdetermined system, i.e., when $A$ has more rows than columns. When the matrix $A$ is square (and invertible), the pseudoinverse and inverse end up giving identical results.
But if A is full rank then isn't A invertible?
Or can A be full rank even if it's not square?
$A$ can be full rank and not square. When the matrix $A$ has more rows than columns, full rank simply refers to the columns being linearly independent. In this case, $A^T A$ becomes a square matrix of full rank. Similarly, when the matrix $A$ has more columns than rows, full rank means that the rows are linearly independent.
So in short, any square matrix is invertible if and only if it is full rank. A non-square matrix, however, is not invertible.
Oh okay thank you for clarifying!
I'm having a hard time recreating the expansion of the error function. I am getting:
$$x^TA^TAx - x^TA^Tb - b^TAx + ||b||^2$$
How does the $b^TAx$ term get grouped with the one before it to produce $-2x^TA^Tb$? Isn't $b^TAx$ the transpose of $x^TA^Tb$?
All of the terms listed in your expansion of the error function are scalars. And the transpose of a scalar produces the same scalar, i.e., $s^T = s$. This means that we can rewrite the expression as follows:
$x^TA^TAx - x^TA^Tb - (b^TAx)^T + ||b||^2$
And you can then simplify to get the equation written on the slide.
Oh nevermind! I think it's okay because both of those expressions would evaluate to a scalar so they are equal (transposing a scalar gives you that scalar).