You can download the slides for the PM lecture here

Lotka-Volterra

In this practical, we are going to perform inference for the Lotka-Volterra model of population dynamics. The model describes the dynamics of population sizes of prey ( $u(t)$ ) and predators ( $v(t)$ ), where, here, $t$ denotes a given time in years. These equations are given by:

\begin{aligned} \frac{d u}{dt} &= \alpha u - \beta u v\\ \frac{d v}{dt} &= -\gamma v + \delta u v \end{aligned}

where at time $t=0$ , $u(0) = u_0$ and $v(0)=v_0$ . Here, $\alpha\geq 0$ yields the prey population growth rate in the absence of predators; $\beta \geq 0$ is the rate of decline in the prey as a result of predation; $\gamma \geq 0$ denotes the rate of decline of predators in absence of prey; $\delta > 0$ is the rate of predator population growth due to predation.

Preliminary question

Have you got PINTS installed on your machine? If not, clone the PINTS repo on: https://github.com/pints-team/pints and install it by executing:

pip install -e .[dev,docs]

on the terminal in the PINTS main directory. (If you want to avoid clashes with any local installed packages, you may want to use a virtual environment.)

Lotka-Volterra

Write a function which solves these two equations for any choice of $(\alpha, \beta, \gamma, \delta, u_0, v_0)$ . Plot the solutions for $\alpha=0.55, \beta=0.028, \gamma=0.84, \delta=0.026, u_0=33, v_0=6$ .

Code up the RHS of equations and use numerical solver.

import numpy as np

def lotka_volterra_rhs(y, t, alpha, beta, gamma, delta):
    u, v = y
    dydt = [alpha * u - beta * u * v, -gamma * v + delta * u * v]
    return dydt

from scipy.integrate import odeint
from plotnine import aes, geom_point, geom_line, geom_path, ggplot
import pandas as pd
import scipy

def lotka_volterra_solve(times, alpha, beta, gamma, delta, u_0, v_0):
    sol = odeint(lotka_volterra_rhs, [u_0, v_0], times, args=(alpha, beta, gamma, delta))
    df = pd.DataFrame({'t': times, 'u': sol[:, 0], 'v': sol[:, 1]})
    return df

times = np.linspace(0, 25, 1001)

df = lotka_volterra_solve(times, 0.55, 0.028, 0.84, 0.026, 33, 6)
df_long = pd.melt(df, id_vars='t')

(ggplot(df_long, aes(x='t', y='value', colour='variable')) + geom_line())

An alternative way to view this is as orbits.

(ggplot(df, aes(x='u', y='v')) + geom_path())

Different parameters

Run the model with $\alpha=0.79, \beta=0.04, \gamma=1.3, \delta=0.04, u_0=33, v_0=6$ . How do the dynamics compare with previous?

Observations

Suppose that the observations for each population are governed by the following:

\begin{aligned} \tilde u &= u \exp(\epsilon_u)\\ \tilde v &= v \exp(\epsilon_v) \end{aligned}

where $\epsilon_u\sim\mathcal{N}(0, \sigma_u)$ and $\epsilon_v\sim\mathcal{N}(0, \sigma_v)$ . Show that these are equivalent to:

\begin{aligned} \tilde u &\sim \text{log-normal}(\log u, \sigma_u)\\ \tilde v &\sim \text{log-normal}(\log v, \sigma_v)\\ \end{aligned}

Generate observations

Using these observation models, generate annual data for 25 years assuming $\sigma_u=\sigma_v=0.25$ and using the parameter sets from parts 1 and 2. Graph these observations.

Note, this may be a helpful resource: https://ben18785.shinyapps.io/distribution-zoo/

def observations(df, sigma_u, sigma_v):
    u_tilde = np.zeros(len(df))
    v_tilde = np.zeros(len(df))
    for i in range(len(u_tilde)):
        u_tilde[i] = scipy.stats.lognorm.rvs(scale=df['u'][i], s=sigma_u, size=1)
        v_tilde[i] = scipy.stats.lognorm.rvs(scale=df['v'][i], s=sigma_v, size=1)
    df['u_tilde'] = u_tilde
    df['v_tilde'] = v_tilde
    return df

times = np.arange(0, 25, 1)
df = lotka_volterra_solve(times, 0.55, 0.028, 0.84, 0.026, 33, 6)
df = observations(df, 0.25, 0.25)
df['simulation'] = '1'
df1 = lotka_volterra_solve(times, 0.9, 0.04, 1.3, 0.04, 33, 6)
df1 = observations(df1, 0.25, 0.25)
df1['simulation'] = '2'

df_both = pd.concat([df, df1])

(ggplot(df_both, aes(x='u_tilde', y='v_tilde', colour='simulation')) + geom_path() + geom_point())

Likelihood

The likelihood for the prey compartment is given by:

L_u = \prod_{t=1}^{T} \text{log-normal}(\tilde u(t)| \log u(t|\alpha, \beta, \gamma, \delta, u_0, v_0), \sigma_u)

with a similar expression for the predator compartment. Here $u(t|\alpha, \beta, \gamma, \delta, u_0, v_0)$ is the solution of the Lotka-Volterra eqns. Write a function which calculates the log-likelihood of a set of observations with given values of $(\alpha, \beta, \gamma, \delta, u_0, v_0, \sigma_u, \sigma_v)$ .

def lotka_volterra_u_loglikelihood(df_obs, alpha, beta, gamma, delta, u_0, v_0, sigma_u, sigma_v):
    df = lotka_volterra_solve(df_obs['t'], alpha, beta, gamma, delta, u_0, v_0)
    log_p = 0.0
    for i in range(len(df_obs['t'])):
        log_p += scipy.stats.lognorm.logpdf(df_obs['u_tilde'][i], scale=df['u'][i], s=sigma_u)
        log_p += scipy.stats.lognorm.logpdf(df_obs['v_tilde'][i], scale=df['v'][i], s=sigma_v)
    return log_p

Likelihood slice

Using the first of the series generated in question 4, plot a 1D slice through the log-likelihood surface for $\alpha$ whilst holding the other parameters $\beta=0.028, \gamma=0.84, \delta=0.026, u_0=33, v_0=6, \sigma_u=0.25, \sigma_v=0.25$ . In this slice, how close is $\alpha$ to its true value?

Intro to PINTS

PINTS is a Python package designed in the Department of Computer Science that provides access to all sorts of inference routines, which is especially good for ODEs. Following the approach given here: https://github.com/pints-team/pints/blob/master/examples/stats/custom-model.ipynb wrap a pints.LogPDF class around the log-likelihood function we just created to allow us to access these methods. We're going to hold a number of parameters constant to make inference a little more manageable for this practical, so set up your method with $(u_0=33, v_0=6, \sigma_u=0.25, \sigma_v=0.25)$ .

import pints

class LotkaVolterraLogPDF(pints.LogPDF):
    def __init__(self, df_obs, u0=33, v0=6, sigma_u=0.25, sigma_v=0.25):
        self._df_obs = df_obs
        self._u0 = u0
        self._v0 = v0
        self._sigma_u = sigma_u
        self._sigma_v = sigma_v

    def __call__(self, x):
        df_obs = self._df_obs
        u0 = self._u0
        v0 = self._v0
        sigma_u = self._sigma_u
        sigma_v = self._sigma_v
        return lotka_volterra_u_loglikelihood(df_obs, x[0], x[1], x[2], x[3],
                                              u0, v0, sigma_u, sigma_v)

    def n_parameters(self):
        return 4

Run MCMC

We now are going to use an inbuilt MCMC method in PINTS called HaarioBardenetACMC to generate posterior samples with 2000 iterations per chain and 200 initial phase iterations across 4 chains. To initialise each of the chains assume:

$(\alpha=0.6, \beta=0.02, \gamma=1.0, \delta=0.03)$

which can be done using:

nchains = 4
xs = [[0.6, 0.02, 1.0, 0.03]] * nchains

Note, initialising chains all at a single point is not good practive but allows us to get up and running faster.

To run the MCMC, we first instantiate the pints.LogPDF object you created in the previous question assuming observational data generated in question 4. Then we instantiate an MCMC Controller object using:

model = LotkaVolterraLogPDF(df)
mcmc = pints.MCMCController(model, nchains, xs, method=pints.HaarioBardenetACMC)

where model is an instantiated version of your model. Then we set the total number of iterations per chain and the initial phase iterations using:

mcmc.set_max_iterations(2000)
mcmc.set_initial_phase_iterations(200)

Finally, we execute:

chains = mcmc.run()

to run the MCMC.

Note that, at the moment, since we have not specified priors, PINTS implicitly assumes that the priors are uniform (and, in this case, improper).

MCMC summary

PINTS has an in-built MCMC summary object that can be called and printed using:

results = pints.MCMCSummary(chains=chains, time=mcmc.time())
print(results)

Have your MCMC chains yet converged? If Rhat > 1.1 (probably 1.01 for publications), for any of your parameters, in practice you would rerun for more iterations. Since we only have a little time today, we won't rerun things.

Plot trace

Using PINTS' plotting tools (see: https://github.com/pints-team/pints/blob/master/examples/sampling/adaptive-covariance-haario-bardenet.ipynb) plot pairwise samples from the parameters. Before we do that, let's discard the first half of the chains as these are likely before convergence occurred.

Parameter Correlations

Why is there positive correlation between the estimates of $\alpha$ and $\beta$ ?

Examining the equation:

$\frac{d u}{dt} = \alpha u - \beta u v$ ,

we see that if $\alpha\uparrow$ then $\frac{d u}{dt}\uparrow$ . So, if $\beta\uparrow$ too, then $\frac{d u}{dt}$ can remain approximately constant, and the dynamics of the system would remain unchanged. In other words, the same observed data sample can be obtained in more than one way.

Posterior predictive

Using a randomly drawn subset of 100 posterior draws, generate posterior predictive draws of the mean predator and prey numbers of time. Then plot these, overlaying the data. Why do the posterior predictive simulations not encompass all the variation seen in the data?

chains_long = np.vstack(chains[:, 1000:])
nsamples = 100
idx = np.random.randint(len(chains_long), size=nsamples)
chains_long = chains_long[idx, ]
times = np.linspace(0, 25, 1001)


for i in range(nsamples):
    temp = lotka_volterra_solve(times,
                                chains_long[i, 0], chains_long[i, 1],
                                chains_long[i, 2], chains_long[i, 3],
                                33, 6)
    temp['replicate'] = str(i)
    if i == 0:
        big_df = temp
    else:
        big_df = big_df.append(temp)

Combine datasets.

df['type'] = 'observations'
big_df['type'] = 'simulations'
big_df1 = pd.concat([df, big_df])

(ggplot(big_df1.query('type=="simulations"'), aes(x='t',y='u'))
 + geom_line(aes(group='replicate'), alpha=0.2)
 + geom_point(big_df1.query('type=="observations"'),
   aes(y='u_tilde'), colour='orange'))

Random initialisation

We are now going to initialise the chains randomly, which is better practice since it is less likely the chains will be biased to a particular area of parameter space. To do this, we are going to create a UniformLogPrior object in PINTS, then use the sample() method to generate independent draws for each parameter across each chain. To do this, we are going to use the following code:

log_prior = pints.UniformLogPrior([0, 0, 0, 0], [2, 0.1, 2.0, 0.2])

Using this log_prior object, sample initial positions for the chains and re-run the MCMC sampling as above. Afterwards, plot the path of the chains over time and compute convergence measures. Do the chains look as close to convergence as before?

log_prior = pints.UniformLogPrior([0, 0, 0, 0],
                                  [2, 0.1, 2.0, 0.2])
xs = log_prior.sample(nchains)
mcmc = pints.MCMCController(model, nchains, xs, method=pints.HaarioBardenetACMC)

# Add stopping criterion
mcmc.set_max_iterations(2000)

# Start adapting after 1000 iterations
mcmc.set_initial_phase_iterations(200)
chains = mcmc.run()

results = pints.MCMCSummary(chains=chains, time=mcmc.time())
print(results)

import pints.plot

pints.plot.trace(chains)
plt.show()

Applied Bayesian Inference - PM exercise

Lotka-Volterra

Preliminary question

Lotka-Volterra

Different parameters

Observations

Generate observations

Likelihood

Intro to PINTS

Run MCMC

MCMC summary

Plot trace

Parameter Correlations

Posterior predictive

Random initialisation