Inference overview

The goal is to perform parameter inference on the Hidden Markov Model using fluorescence observations in order to obtain samples from the posterior probability P(θ | F1:n ). The parameterisation of the model is
θ = [X0 , r , σ2], where X0 is the initial number of mtDNA molecules, r the amplification efficiency and σ2 the fluorescence noise. This project extends the work of Lalam, 2007 and uses a specialisation of the Metropolis Hastings algorithm, namely pseudo-marginal Metripolis Hastings (Wilkinson, 2011), coded in C.

Metropolis Hastings (MH) is a common method for sampling from the posterior when the posterior itself is difficult to compute, but a function g(θ) that is proportional to the posterior is computationally feasible. The function g(θ) that is used in this case is the product of the likelihood and the prior probabilities:

g(θ | F1:n ) = p(F1:n | θ) π(θ)

Many different versions of MH exist. We use pseudo-marginal Metropolis Hastings (PMMH) in order to account for the hidden states of the Markov Model (see below).

Three subsequent runs of different MH versions are run:

  • Initial θMAP estimation: A PMMH run for a limited number of iterations (e.g. 10,000) in order to get a better estimate of θMAP.
  • Covariance estimation: An adaptive PMMH run for a limited number of iterations (discussed below) in order to get a better estimate for the proposal covariance.
  • Main run: A normal PMMH run using the initial θMAP and the covariance obtained in the previous iterations.
The code for inference is written in C and available on GitHub.

General Metropolis-Hastings

A generic MH algorithm has the following steps (Wilkinson, 2011):

  1. Initialise with some parameter θ, and compute the log posterior log g(θ).
  2. Propose a new parameter value θ’ using a proposal distribution Q(θ’|θ). Our proposal distribution is a multivariate Gaussian centered at θ, N(θ, Σ).
  3. Compute the new log posterior log g(θ’). If log g(θ’) > log g(θ) accept the proposal θ’, otherwise accept with probability g(θ’)/g(θ).
  4. Return to step 2.

In literature, the function g(θ) is called the log posterior, although in fact g(θ) is just proportional to the posterior P(θ | F1:n ). The log posterior is the sum of log priors for X0, r and σ2 (assumed independent) and the log likelihood.

MH flowchart
Figure 20. Flow chart for the general version of the Metropolis Hastings algorithm.

One of the key aspects of the algorithm is computing the log likelihood. In the particular case of a Hidden Markov Model, taking into account the hidden state variables (the number of molecules X1:n) makes the exact computation of the log likelihood unfeasible, so a bootstrap approach is used instead (more details in the next section).

Pseudo-marginal Metropolis Hastings

The pseudo-marginal approach is used to perform inference on Hidden Markov Models, where the hidden states do not allow a straight-forward and feasible computation of the log likelihood during a MH step. As we are not interested in the values of the hidden state variables X1:n, an unbiased estimate for the marginal likelihood P(F1:n | θ) is obtained using an approach called the bootstrap particle filter (Wilkinson, 2011).

MH is performed as normal and the bootstrap particle filter is applied when computing the log posterior for a new parameter proposal θ. The computation of the log posterior requires computing the likelihood P(F1:n | θ). The likelihood is the probability of obtaining the observed fluorescence reads F1:n, given a fixed theta θ, P(F1:n | θ):

P(F1:n | θ) = ∑ X1:n P(F1:n | θ, X1:n ) P(X1:n | θ)

The summation (marginalisation) over all possible values X1:n is not computationally feasible. Instead of summing over all the possible values of the amplification curve X1:n, we obtain an unbiased estimate using a limited number of simulations of the amplification curve, called a particle cloud. The method is called pseudo-marginal as it 'marginalises' over a subset of the possible values of X1:n. As the number of particles approaches infinity, the approximation would approach the real likelihood. In practice, an order of 100 particles are used.

The bootstrap particle algorithm was adapted from the work of Wilkinson, 2011, which describes the steps as follows:

  1. For t = 0, initialise the procedure with a sample x0k = X0, where X0 is the part of the proposal θ, and k = 1, ..., M. Assign uniform normalised weights w'0k.
  2. At time t, the weighed sample is (xtk, w'tk | k = 1, ..., M) representing draws from P(xt | F1:t).
  3. Use discrete sampling based on weights w'tk to resample with replacement M times and obtain x*tk.
  4. Propagate each particle forward x*tk according to the transition probability of the Markov process P(xt+1 | xt, θ).
    In our case, a particle xtk is propagated forward by sampling from a binomial distribution Δk ~ Binom(xtk, r) and setting xt+1k = xtk + Δk. The parameter r in the binomial distribution is the efficiency parameter from the current proposal θ = [X0 , r , σ2].
  5. For each particle, compute a weight wt+1k = P(Ft+1 | xt+1k), according to the observation probability of the current state in the Hidden Markov Model. Each weight is normalised to get w't+1k = wt+1k / ∑iwt+1i.
    In our case, the observation probability P(Ft+1 | xt+1k) is the Gaussian noise in the observation of fluorescence Ft+1, when the number of molecules xt+1k is constant:
    Gaussian noise equation
    where σ2 is the variance parameter from the current proposal θ = [ X0 , r , σ2 ], and α is the fluorescence coefficient assumed known and fixed throughout inference.
    The new weights represent samples from P(xt+1 | F1:t+1)
  6. Return to step 2 and repeat for the next time step t+1.

PMMH diagram
Figure 21. Diagram showing the bootstrap particle filter approach for computing the log-posterior during a MH step. At each step t, the particles xtk are resampled according to the weights wtk. Then, each particle is propagated forward according to the transition probability of the HMM (binomial branching process). Finally, new weights are computed according to the observation probability of the HMM (Gaussian observation noise).

Behaviour for large initial copy numbers

It was mentioned in the section on estimating the fluorescence coefficient α that for large initial copy numbers, the amplification process behaves deterministically and the inference method does not work as expected. For this reason, sigmoid curve fitting was used to obtain α.

The PMMH algorithm cannot be used when the process behaves deterministically because the weights computed for each particle during the bootstrap particle filter function differ by a large amount. These weights are computed using the probability density function of the Gaussian noise, and the maximum weight at each step of the bootstrap particle filter is significantly larger than all the weight of all other particles. The discrete sampling used to resample the particles at each step will almost always choose that particle. This phenomenon causes only one particle to be resampled (that of maximal weight), causing the bootstrap particle filter to behave as if using only one particle (in other words, using simple Metropolis Hastings and running a single qPCR simulation when computing the log posterior of a proposal). This behaviour causes the chain to mix very slowly, making inference unuseable.

Adaptive Metropolis Hastings

A Metropolis Hastings algorithm uses a covariance matrix Σ in the proposal distribution N(θ, Σ) used to propose new samples θ' from a current sample θ. In order to get a better estimate for the covariance matrix Σ that is used in PMMH, adaptive MH is run first for a number of iterations. This adaptive algorithm iteratively updates the covariance matrix Σ used to propose new samples, and uses the empirical covariance of the samples accepted up until that point. For brevity purposes, it is not discussed in further detail here, but is described in full detail in a paper by Haario et al., 2001.

The final covariance matrix computed by Adaptive Metropolis Hastings is used in the pseudo-marginal Metropolis Hastings algorithm described in the previous section.

Prior parameter distributions

Prior parameter distributions encapsulate the knowledge we have about the parameters prior to the inference. The shape of priors has a significant impact on the inference. The choice of priors was based on the work of Lalam, 2007, which compared the effect of different priors on the shape of posterior distributions.

The following priors were used in inference:

  • X0 — discrete uniform prior within range (1, 100)
  • r — Beta(0.5, 0.5) prior, which is a Jeffreys prior for a binomial distribution Binom(N, r) with unknown probability r
  • σ2 — 1/√σ, which is a Jeffreys prior for the variance of Gaussian noise N(0, σ2)
The priors for r and σ2 are common choices for zero-knowledge priors for probabilities and noise, respectively. A useful future improvement of the project would be incorporating experimental knowledge in new priors and analysing their effect on posterior distributions.

Figure 22. The priors for the model parameters are zero-knowledge priors. The inferred posterior distributions of parameters can be narrowed (giving more confident estimates) by incorporating experimental knowledge (such as the bounds for efficiency r) to restrict the shape of priors.