Skip to main content

Scientific Computing

Non-linear Optimis...

Trust Region ... []

Scientific Computing

Non-linear Optimis...

Nelder-Mead m... []

This material has been adapted from material by Martin Robinson from the "Scientific Computing" module of the SABS R³ Center for Doctoral Training.

This material has been adapted from material by Martin Robinson from the "Scientific Computing" module of the SABS R³ Center for Doctoral Training.

Creative Commons License
This course material was developed as part of UNIVERSE-HPC, which is funded through the SPF ExCALIBUR programme under grant number EP/W035731/1

This course material was developed as part of UNIVERSE-HPC, which is funded through the SPF ExCALIBUR programme under grant number EP/W035731/1

Creative Commons License

Derivative-free methods

The line search and trust region methods introduced in the previous lesson all required that the user be able to calculate the gradient of the function f\nabla f. However, in many cases the gradient is either not available or too error-prone to be of use. For example, the function ff might be only available as a compiled executable or the result of a physical experiment. The model might be stochastic, or the gradient evaluation might be noisy due to numerical innacuracies, or of sufficiently complexity that the gradient is either unknown or too expensive to compute.

Here we describe two of the most common methods for derivative-free optimisation, using a finite difference approximation to approximate the derivative, and the Nelder-Mead algorithm, which is a Simplex search method. However, there are a large number of derivative-free methods, ranging from the classical
Direct Search methods like Pattern search, Simplex search, Rosenbrock' or Powell's methods. Then there are emulator or model -based methods that build up a model of the function ff and minimise that using a gradient-based method, a powerful example of this class of methods is Bayesian Optimisation. Many global optimsiation algorithms are derivative-free, including randomised algorithms such as Simulated Annealing, and evolution-based strategies such as the popular Covariance matrix adaptation evolution strategy (CMA-ES), or swarm algorithms inspired from bees/ants like Particle Swarm Optimisation.

Finite difference

The simplest way of converting a gradient-based optimisation algorithm to a derivative free one is to approximate the gradient of the function using finite differences.

The Finite Difference (FD) method is based on taking a Taylor series expansion of either f(x+h)f(x+h) and f(xh)f(x-h) (and others) for a small parameter ff about xx. Consider a smooth function f(x)f(x) then its Taylor expansion is

f(x+h)=f(x)+hf(x)+h22f(x)+h36f(x)+h424f(x)+f(x+h) = f(x) + h f'(x) + \frac{h^2}{2} f''(x) + \frac{h^3}{6} f'''(x) + \frac{h^4}{24} f'''''(x) + \ldots
f(xh)=f(x)hf(x)+h22f(x)h36f(x)+h424f(x)f(x-h) = f(x) - h f'(x) + \frac{h^2}{2} f''(x) - \frac{h^3}{6} f'''(x) + \frac{h^4}{24} f'''''(x) - \ldots

From this, we can compute three different schemes (approximations) to u(x)u'(x):

Forward difference:

u(x)=u(x+h)u(x)h+O(h)u'(x) = \frac{u(x+h)-u(x)}{h} + O(h)

Backward difference:

u(x)=u(x)u(xh)h+O(h)u'(x) = \frac{u(x)-u(x-h)}{h} + O(h)

Centered difference:

u(x)=u(x+h)u(xh)2h+O(h2)u'(x) = \frac{u(x+h)-u(x-h)}{2h} + O(h^2)

Finite difference approximations are easily computed, but suffer from innacuracies which can cause optimisation algorithms to fail or perform poorely. As well as the error in the FD approximation itself (e.g. O(h2)O(h^2) for centered difference), the function evaluation itself might have some numerical or stochastic "noise". If this noise dominates over the (small) step size hh, then it is entirely probable that the calculated steepest descent f(x)-\nabla f(x) will not be a direction of descent for ff.

Software

It is very common that optimisation libraries provide a finite difference approximation to the Jacobian f\nabla f if it is not supplied, as is done for the gradient-based methods in scipy.optimize.

More dedicated libraries can give superior approximations to the gradient, like the numdifftools package. This library provides higher order FD approximations and Richardson extrapolation to evaluate the limit of h0h \rightarrow 0, and can calculate Jacobians and Hessians of user-supplied functions.

Problems

Comparing optimisation methods

Use the following methods from scipy.optimize to minimize the 2D Rosenbrock function:

  • Nelder-Mead Simplex

  • Conjugate Gradient

  • BFGS Quasi-Newton

  • Newton-CG

  • SHG Global Optimisation

Either use x0=(1.2,1)Tx_0 = (−1.2, 1)^T as the starting point, or experiment with your own.

In each case perform the optimisation with and without a user-supplied jacobian and evaluate the effect on the number of evaluations of the function ff required to converge to the optimum. Optional: You can take the derivatives by hand, or use automatic differentiation via the autograd or JAX packages