Skip to main content

Scientific Computing

Linear Algebra Sol...

Cholesky deco... []

This material has been adapted from material by Martin Robinson from the "Scientific Computing" module of the SABS R³ Center for Doctoral Training.

This material has been adapted from material by Martin Robinson from the "Scientific Computing" module of the SABS R³ Center for Doctoral Training.

Creative Commons License
This course material was developed as part of UNIVERSE-HPC, which is funded through the SPF ExCALIBUR programme under grant number EP/W035731/1

This course material was developed as part of UNIVERSE-HPC, which is funded through the SPF ExCALIBUR programme under grant number EP/W035731/1

Creative Commons License

QR decomposition

QR decomposition

The least-squares problem

One of the most important application of the QRQR decomposition is the least squares solution of a set of overdetermined equations. That is a set of mm linear equations with nn unknowns, with mnm \ge n. The least squares problem to be solved is the mimimisation of Axb2||A x - b ||_2, where x2=x12+x22+...+xm2|| x ||_2 = \sqrt{x_1^2 + x_2^2 + ... + x_m^2} is the standard 2-norm, and where ARm×nA \in \mathbb{R}^{m \times n} with mnm \ge n and bRmb \in \mathbb{R}^m. In this case, the problem Ax=bAx = b will often have no solution, and thus it is nessessary to consider AxAx and bb as approximatelly equal, and to minimise the distance between them by minimising the loss function Axb2||A x - b||_2.

To solve this least squares problem, we need to consider the subspace of all vectors in Rm\mathbb{R}^{m} that are formed from linear combinations of the columns of AA. This is known as the column space of the AA, and is denoted as Col A\text{Col }A. Given that any linear combination of the columns of AA will lie in this space, we can say that AxAx will also lie in Col A\text{Col }A for any xx.

Now consider a projection of bb into the column space of AA to give a new vector b^\hat{b} (i.e. b^\hat{b} is the closest point in Col AA to bb), see the diagram below. Because b^\hat{b} is in the column space of AA, we know that there is another vector x^\hat{x} that also lies in Col AA and satisfies

Ax^=b^A \hat{x} = \hat{b}

Since b^\hat{b} is the closest point to bb in the column space of AA, we can therefore say that x^\hat{x} is the least-squares solution.

least squares problem

We can show that the vector bb^=bAx^b - \hat{b} = b - A \hat{x} is orthogonal to Col AA and therefore also orthogonal to each column in AA, so we have ajT(bAx^)a_j^T (b - A \hat{x}) for each column aja_j of AA. Putting these mm equations together we can write

AT(bAx^)=0A^T (b - A \hat{x}) = 0

or rearranged slightly, we can find the least-sqaures solution x^\hat{x} via the solution of the equation

ATAx^=ATbA^T A \hat{x} = A^T b

The QRQR decomposition divides A=QRA = QR into an orthogonal matrix QQ, and an upper triangular matrix RR. Most importantly for the least-squares problem, the matrix QQ is also an orthonormal basis for Col AA and therefore b^=QQTb\hat{b} = Q Q^T b.

Given this decomposition, it can be shown that the least squares solution of Ax=bA x = b is given by

x^=R1QTb\hat{x} = R^{-1} Q^T b

To prove this, we let x^=R1QTb\hat{x} = R^{-1} Q^T b and make the following substitutions

Ax^=QRx^=QRR1QTb=QQTb=b^A\hat{x} = QR \hat{x} = QRR^{-1}Q^T b = Q Q^T b = \hat{b}

Therefore Ax^=b^A\hat{x} = \hat{b}, which proves that x^\hat{x} is the least-squares solution for Ax=bA x = b

Finally, we note that the inverse R1R^{-1} should not be calculated directly, but instead x^\hat{x} should be found by solving

Rx=QTbR x = Q^T b

Constructing the QR decomposition

QRQR decomposisions are normally computed via Householder reflections, Givens rotations or the Gram-Schmidt process. For a brief summary of the first two methods, it is useful to consider a simple 2×22 \times 2 reflection or rotation of a 2d vector. For example, the matrix

Q=(cos(θ)sin(θ)sin(θ)cos(θ))Q = \left(\begin{matrix} \cos(\theta) & \sin(\theta) \\ -\sin(\theta) & \cos(\theta) \end{matrix}\right)

is a rotation matrix that when applied to a vector xx will result in y=Qxy = Qx, where yy is rotated counterclockwise through the angle θ\theta. QQ is also orthogonal since QQT=IQQ^T = I.

Similarly, a 2×22 \times 2 reflection matrix can be constructed as

Q=(cos(θ)sin(θ)sin(θ)cos(θ))Q = \left(\begin{matrix} \cos(\theta) & \sin(\theta) \\ \sin(\theta) & -\cos(\theta) \end{matrix}\right)

which when applied to a vector xx will result in y=Qxy = Qx, where yy is reflected across the line defined by span((cos(θ),sin(θ))T)\text{span}((\cos(\theta), \sin(\theta))^T).

Rotations and reflections are often useful because they can be selected in order to introduce zeros to the vector they are applied to. Given an m×nm \times n matrix AA, a series of nn Householder reflections can be applied to reduce AA to an upper triangular matrix RR

Hn...H2H1A=RH_n ... H_2 H_1 A = R

By setting Q=H1H2...HnQ = H_1 H_2 ... H_n, we can show that A=QRA = QR, and that QQ is an orthogonal matrix which is also an orthonormal basis for the column space of AA.

Similarly, a Givens rotation can be used to zero a single component of AA, so that a a series of rotations can be used to contruct the upper triangular matrix RR

Gj...G2G1A=RG_j ... G_2 G_1 A = R

so that Q=G1G2...GjQ = G_1 G_2 ... G_j, and A=QRA = QR. For both the Householder and Givens methods, it is often useful to not construct the full matrix QQ but to keep QQ factored as a implicit product of either H1H2...HnH_1 H_2 ... H_n or G1G2...GjG_1 G_2 ... G_j. Fast algorithms exist to calculate the produce of these factored forms to another vector.

The final method to contruct a QRQR decomposition is using the Gram-Schmidt process, which is a process for contructing an orthogonal or orthonormal basis for a given subspace defined by the span of the set of vectors x1,x2,...,xnx_1, x_2, ..., x_n. If these nn vectors are the columns of the m×nm \times n matrix AA, then the Gram-Schmidt process can be used to directly contruct the orthonormal basis of the column space of AA given by QQ, and that A=QRA = QR where RR is an upper triangular matrix. The matrix RR can be calculated using R=QTAR = Q^T A. Note that the classical Gram-Schmidt exhibits poor numerical qualities, therefore a modified version of the algorithm exists, which is described in the Golub and Van Loan Matrix Computations textbook listed below.

In terms of computational work, the Householder method takes 2n2(mn/3)2n^2(m-n/3) flops to compute QQ in factored form, and another 2n2(mn/3)2n^2(m-n/3) to get the full matrix QQ, whereas the Gram-Schmidt method is more efficient at 2mn22mn^2 flops. However, Householder is normally prefered in practice as even with the modified algorithm the numerical properies of the Gram-Schmidt are still poor in comparison with both Householder and Givens (i.e. the final orthogonality of QQ is not ideal), so is only useful when the columns of AA are already fairly independent. Using Givens rotations the matrix RR can be found in 2n2(mn/3)2n^2(m-n/3), or the factorised form of the QRQR decomposition can be found in the same amount of time. The full matrix QQ is not normally calculated via Givens rotations. Using Givens rotations is most useful when there are only few non-zeros in AA, and is more easily parallised than Householder.

Other Reading

The discussion in this section relied on concepts such as orthogonal and orthonormal vector pairs, vector spaces and subspaces and basis vectors. It is well worth investigating these topics further in:

  • Linear algebra and its applications by David C. Lay. Chapers 4 & 6.

Additional reading on the QRQR decomposition can be found at:

Software

Problems

Model fitting

For this exercises we will be using some data on Oxford's weather which is hosted by Saad Jbabdi from the Wellcome Centre for Integrative NeuroImaging (FMRIB), which can be obtained here.

We wish to fit a quadratic model of the form y=ax2+bx+cy = a x^2 + b x + c to the hours of sunlight observed in Oxford (7th column in OxfordWeather.txt) versus the month (2nd column). The dataset in question has m>3m > 3 data points, so our model gives us a set of mm equations for 3 unknowns aa, bb, and cc that are overdetermined, that is, for each data point (yi,xi)(y_i, x_i) for i=1..mi=1..m we have:

yi=axi2+bxi+cy_i = a x_i^2 + b x_i + c

Use a QRQR decomposition to find the least-squares solution to these equations (you can check it using np.linalg.lstsq if you like), and therefore fit the model to the data. Plot the model and the data side by side to qualitatively evaluate the fit.