Cholesky decomposition
Cholesky decomposition
Symmetric positive definite matrices are a very special type of matrix that often arise in practice. From a computational point of view, this class of matrix is very attractive because it is possible to decompose a symmetic positive definite matrix very efficiently into a single lower triangular matrix so that .
A matrix is positive definite if for any nonzero . This statement by itself is not terribly intuitive, so lets look at also look at an example of a matrix
If is symmetic positive definite (SPD) then
The first two equations show that the diagonal entries of must be positive, and combining the last two equations imply , that is that the matrix has much of its "mass" on the diagonal (note: this is not the same as the matrix being diagonally dominant, where ). These two observations for our matrix also apply for a general SPD matrix. One of the very nice consequences of this "weighty" diagonal for SPD matrices is that it precludes the need for pivoting.
It can be shown that if is a SPD matrix, then the decomposition exists and that has positive diagonal entries. Therefore, it is straightforward to see that = , where . The decomposition is known as the cholesky decomposition and can be efficiently constructed in flops. There are a number of algorithms to construct this decomposition, and both the wikipedia entry and Chapter 4.2 of the Matrix Computations textbook by Golub and Van Loan gives a number of different varients.
Note that a decomposition can also be used to calculate a cholesky decomposition, and this could be more efficient approach since (a) the SPD structure means that we can neglect pivoting in the decomposition, and (b) the decomposition does not requiring taking the square root of the diagonal elements.
Other Reading
Golub, G. H. & Van Loan, C. F. Matrix Computations, 3rd Ed. (Johns Hopkins University Press, 1996). Chapter 4.2
Software
Problems
Sampling random fields
Imagine that we wanted to sample an array of values , for , where each value is sampled from an independent normal distribution with standard deviation
This could be achieved, for example, by sampling from a normal distribution with unit standard deviation, a function that typically exists in any computer language, then multiplying by
where
Now imagine that instead of an independent normal distribution you wish to sample from a multivariate normal distribution with some covariance matrix
We can achive this in practice by using the Cholesky decomposition. A covariance matrix is a symmetic positive semidefinite matrix (i.e. }, and therefore can be decomposed into . We can then draw a sample from by scaling an independently generated random vector by
where each element of the vector is .
Write Python code to randomly sample an n-dimensional vector from
an independent normal distribution with variance .
a multivariate normal distribution using a covariance matrix . Try different values for the magnitute , and lenghtscale parameters and their effect on the sampled . Hint: while the expression for is guarrenteed to be positive definte for all values of and , numerical round-off can mean that the Cholesky decomposition can fail. To guarrentee a positive definite , try adding a small amount (e.g. 1e-5) to the diagonal of . This is equivilent to adding a very small amount of independent normal noise to .
Likelihood
Now imagine that we have a vector of measurements , and we assume that a suitable model for these measurements is that they are generated from a zero-mean, multivariate normal distribuion, i.e.
We assume that the covariance matrix is of the following form, with two parameters .
We can write down the likelihood of the covariance parameters , given a given dataset , by using the probability distribution function (PDF) for a zero-mean multivariate normal distribution
Typically we work with the log of the likelihood for numerical reasons, which is
Generate a simulated dataset using your code for question (2) using "true" parameters . Then calculate the log-likelihood using the Cholesky decomposition to efficiently calculate the log determinant and the inverse of the covariance matrix. Vary and satisfy yourself that the maximum of the likelihood occurs at your "true" parameters. In practice, when you don't know the true parameters, you could use an optimisation algorithm to automatically determine the most likely model parameters that give rise to your data.