QR decomposition
QR decomposition
The least-squares problem
One of the most important application of the decomposition is the least squares solution of a set of overdetermined equations. That is a set of linear equations with unknowns, with . The least squares problem to be solved is the mimimisation of , where is the standard 2-norm, and where with and . In this case, the problem will often have no solution, and thus it is nessessary to consider and as approximatelly equal, and to minimise the distance between them by minimising the loss function .
To solve this least squares problem, we need to consider the subspace of all vectors in that are formed from linear combinations of the columns of . This is known as the column space of the , and is denoted as . Given that any linear combination of the columns of will lie in this space, we can say that will also lie in for any .
Now consider a projection of into the column space of to give a new vector (i.e. is the closest point in Col to ), see the diagram below. Because is in the column space of , we know that there is another vector that also lies in Col and satisfies
Since is the closest point to in the column space of , we can therefore say that is the least-squares solution.
We can show that the vector is orthogonal to Col and therefore also orthogonal to each column in , so we have for each column of . Putting these equations together we can write
or rearranged slightly, we can find the least-sqaures solution via the solution of the equation
The decomposition divides into an orthogonal matrix , and an upper triangular matrix . Most importantly for the least-squares problem, the matrix is also an orthonormal basis for Col and therefore .
Given this decomposition, it can be shown that the least squares solution of is given by
To prove this, we let and make the following substitutions
Therefore , which proves that is the least-squares solution for
Finally, we note that the inverse should not be calculated directly, but instead should be found by solving
Constructing the QR decomposition
decomposisions are normally computed via Householder reflections, Givens rotations or the Gram-Schmidt process. For a brief summary of the first two methods, it is useful to consider a simple reflection or rotation of a 2d vector. For example, the matrix
is a rotation matrix that when applied to a vector will result in , where is rotated counterclockwise through the angle . is also orthogonal since .
Similarly, a reflection matrix can be constructed as
which when applied to a vector will result in , where is reflected across the line defined by .
Rotations and reflections are often useful because they can be selected in order to introduce zeros to the vector they are applied to. Given an matrix , a series of Householder reflections can be applied to reduce to an upper triangular matrix
By setting , we can show that , and that is an orthogonal matrix which is also an orthonormal basis for the column space of .
Similarly, a Givens rotation can be used to zero a single component of , so that a a series of rotations can be used to contruct the upper triangular matrix
so that , and . For both the Householder and Givens methods, it is often useful to not construct the full matrix but to keep factored as a implicit product of either or . Fast algorithms exist to calculate the produce of these factored forms to another vector.
The final method to contruct a decomposition is using the Gram-Schmidt process, which is a process for contructing an orthogonal or orthonormal basis for a given subspace defined by the span of the set of vectors . If these vectors are the columns of the matrix , then the Gram-Schmidt process can be used to directly contruct the orthonormal basis of the column space of given by , and that where is an upper triangular matrix. The matrix can be calculated using . Note that the classical Gram-Schmidt exhibits poor numerical qualities, therefore a modified version of the algorithm exists, which is described in the Golub and Van Loan Matrix Computations textbook listed below.
In terms of computational work, the Householder method takes flops to compute in factored form, and another to get the full matrix , whereas the Gram-Schmidt method is more efficient at flops. However, Householder is normally prefered in practice as even with the modified algorithm the numerical properies of the Gram-Schmidt are still poor in comparison with both Householder and Givens (i.e. the final orthogonality of is not ideal), so is only useful when the columns of are already fairly independent. Using Givens rotations the matrix can be found in , or the factorised form of the decomposition can be found in the same amount of time. The full matrix is not normally calculated via Givens rotations. Using Givens rotations is most useful when there are only few non-zeros in , and is more easily parallised than Householder.
Other Reading
The discussion in this section relied on concepts such as orthogonal and orthonormal vector pairs, vector spaces and subspaces and basis vectors. It is well worth investigating these topics further in:
Linear algebra and its applications by David C. Lay. Chapers 4 & 6.
Additional reading on the decomposition can be found at:
Linear algebra and its applications by David C. Lay. Chaper 6.4
Golub, G. H. & Van Loan, C. F. Matrix Computations, 3rd Ed. (Johns Hopkins University Press, 1996). Chapter 5.2
Software
Problems
Model fitting
For this exercises we will be using some data on Oxford's weather which is hosted by Saad Jbabdi from the Wellcome Centre for Integrative NeuroImaging (FMRIB), which can be obtained here.
We wish to fit a quadratic model of the form to the hours of
sunlight observed in Oxford (7th column in OxfordWeather.txt
) versus the month (2nd
column). The dataset in question has data points, so our model gives us a set of
equations for 3 unknowns , , and that are overdetermined, that is, for
each data point for we have:
Use a decomposition to find the least-squares solution to these equations (you can
check it using np.linalg.lstsq
if you like), and therefore fit the model to the data.
Plot the model and the data side by side to qualitatively evaluate the fit.