Skip to main content

Scientific Computing

Linear Algebra Sol...

LU decomposition []

Scientific Computing

Linear Algebra Sol...

QR decomposition []

This material has been adapted from material by Martin Robinson from the "Scientific Computing" module of the SABS R³ Center for Doctoral Training.

This material has been adapted from material by Martin Robinson from the "Scientific Computing" module of the SABS R³ Center for Doctoral Training.

Creative Commons License
This course material was developed as part of UNIVERSE-HPC, which is funded through the SPF ExCALIBUR programme under grant number EP/W035731/1

This course material was developed as part of UNIVERSE-HPC, which is funded through the SPF ExCALIBUR programme under grant number EP/W035731/1

Creative Commons License

Cholesky decomposition

Cholesky decomposition

Symmetric positive definite matrices are a very special type of matrix that often arise in practice. From a computational point of view, this class of matrix is very attractive because it is possible to decompose a symmetic positive definite matrix AA very efficiently into a single lower triangular matrix GG so that A=GGTA = GG^T.

A matrix AA is positive definite if xTAx>0x^T A x > 0 for any nonzero xRx \in \mathbb{R}. This statement by itself is not terribly intuitive, so lets look at also look at an example of a 2×22 \times 2 matrix

A=(a11a12a21a22)A = \left(\begin{matrix} a_{11} & a_{12} \\ a_{21} & a_{22} \end{matrix}\right)

If AA is symmetic positive definite (SPD) then

x=(1,0)TxTAx=a11>0x=(0,1)TxTAx=a22>0x=(1,1)TxTAx=a11+2a12+a22>0x=(1,1)TxTAx=a112a12+a22>0\begin{aligned} x &= (1, 0)^T \Rightarrow x^T A x = a_{11} > 0 \\ x &= (0, 1)^T \Rightarrow x^T A x = a_{22} > 0 \\ x &= (1, 1)^T \Rightarrow x^T A x = a_{11} + 2a_{12} + a_{22} > 0 \\ x &= (1,-1)^T \Rightarrow x^T A x = a_{11} - 2a_{12} + a_{22} > 0 \\ \end{aligned}

The first two equations show that the diagonal entries of AA must be positive, and combining the last two equations imply a12(a11+a22)/2|a_{12}| \le (a_{11} + a_{22}) / 2, that is that the matrix has much of its "mass" on the diagonal (note: this is not the same as the matrix being diagonally dominant, where aii>i=1...n,jiaij|a_{ii}| > \sum_{i=1...n,j \ne i} |a_{ij}|). These two observations for our 2×22 \times 2 matrix also apply for a general n×nn \times n SPD matrix. One of the very nice consequences of this "weighty" diagonal for SPD matrices is that it precludes the need for pivoting.

It can be shown that if AA is a SPD matrix, then the LDLTLDL^T decomposition exists and that D=diag(d1,...,dn)D = \text{diag}(d_1, ..., d_n) has positive diagonal entries. Therefore, it is straightforward to see that LDLTLDL^T = GGTGG^T, where G=Ldiag(d1,...,dn)G = L \text{diag}(\sqrt{d_1}, ..., \sqrt{d_n}). The decomposition A=GGTA = GG^T is known as the cholesky decomposition and can be efficiently constructed in n3/3n^3 / 3 flops. There are a number of algorithms to construct this decomposition, and both the wikipedia entry and Chapter 4.2 of the Matrix Computations textbook by Golub and Van Loan gives a number of different varients.

Note that a LDLLDL decomposition can also be used to calculate a cholesky decomposition, and this could be more efficient approach since (a) the SPD structure means that we can neglect pivoting in the LDLLDL decomposition, and (b) the LDLLDL decomposition does not requiring taking the square root of the diagonal elements.

Other Reading

Software

Problems

Sampling random fields

Imagine that we wanted to sample an array of values xix_i, for i=1...ni = 1...n, where each value is sampled from an independent normal distribution with standard deviation σ\sigma

xiN(0,σ)x_i \sim \mathcal{N}(0, \sigma)

This could be achieved, for example, by sampling from a normal distribution with unit standard deviation, a function that typically exists in any computer language, then multiplying by σ\sigma

xi=σηx_i = \sigma \eta

where ηN(0,1)\eta \sim \mathcal{N}(0, 1)

Now imagine that instead of an independent normal distribution you wish to sample x=[x1,x2,...,xn]\mathbf{x} = [x_1, x_2, ..., x_n] from a multivariate normal distribution with some covariance matrix Σ\Sigma

xN(0,Σ)\mathbf{x} \sim \mathcal{N}(\mathbf{0}, \Sigma)

We can achive this in practice by using the Cholesky decomposition. A covariance matrix is a symmetic positive semidefinite matrix (i.e. xTΣx0x^T \Sigma x \ge 0}, and therefore can be decomposed into Σ=LLT\Sigma = LL^T. We can then draw a sample from N(0,Σ)\mathcal{N}(\mathbf{0}, \Sigma) by scaling an independently generated random vector by LL

x=Lη\mathbf{x} = L \mathbf{\eta}

where each element of the vector η\eta is ηiN(0,1)\eta_i \sim \mathcal{N}(0, 1).

Write Python code to randomly sample an n-dimensional vector xx from

  1. an independent normal distribution with variance σ12\sigma_1^2.

  2. a multivariate normal distribution using a covariance matrix Σij=σ12exp((ij)2/σ22)\Sigma_{ij} = \sigma_1^2 \exp{(-(i- j)^2 / \sigma_2^2)}. Try different values for the magnitute σ1\sigma_1, and lenghtscale σ2\sigma_2 parameters and their effect on the sampled x\mathbf{x}. Hint: while the expression for Σ\Sigma is guarrenteed to be positive definte for all values of σ1\sigma_1 and σ2\sigma_2, numerical round-off can mean that the Cholesky decomposition can fail. To guarrentee a positive definite Σ\Sigma, try adding a small amount (e.g. 1e-5) to the diagonal of Σ\Sigma. This is equivilent to adding a very small amount of independent normal noise to x\mathbf{x}.

Likelihood

Now imagine that we have a vector of measurements x\mathbf{x}, and we assume that a suitable model for these measurements is that they are generated from a zero-mean, multivariate normal distribuion, i.e.

xN(0,Σ)\mathbf{x} \sim \mathcal{N}(\mathbf{0}, \Sigma)

We assume that the covariance matrix is of the following form, with two parameters θ=(σ1,σ2)\mathbf{\theta} = (\sigma_1, \sigma_2).

Σij=σ12exp((ij)2/σ22)\Sigma_{ij} = \sigma_1^2 \exp{(-(i- j)^2/ \sigma_2^2)}

We can write down the likelihood of the covariance parameters θ\mathbf{\theta}, given a given dataset x\mathbf{x}, by using the probability distribution function (PDF) for a zero-mean multivariate normal distribution

P(θx)=(2π)n2 det(Σ)12exp(12xTΣ1x)P(\mathbf{\theta} | \mathbf{x}) = (2 \pi)^{\frac{n}{2}} \text{ det}(\Sigma)^{\frac{1}{2}} \exp{\left( \frac{1}{2}\mathbf{x}^T \Sigma^{-1} \mathbf{x}\right)}

Typically we work with the log of the likelihood for numerical reasons, which is

L=12log(Σ)+xTΣ1x+n2log(2π)\mathcal{L} = -\frac{1}{2} \log(|\Sigma|) + \mathbf{x}^T \Sigma^{-1} \mathbf{x} + \frac{n}{2} \log(2\pi)

Generate a simulated dataset x\mathbf{x} using your code for question (2) using "true" parameters θt=(σ1t,σ2t)\mathbf{\theta}^t = (\sigma^t_1, \sigma^t_2). Then calculate the log-likelihood using the Cholesky decomposition to efficiently calculate the log determinant and the inverse of the covariance matrix. Vary θ\mathbf{\theta} and satisfy yourself that the maximum of the likelihood occurs at your "true" parameters. In practice, when you don't know the true parameters, you could use an optimisation algorithm to automatically determine the most likely model parameters that give rise to your data.