Summary

Quantification of single-cell mtDNA levels is highly important in understanding mechanisms involving mtDNA cellular level and developing more accurate models at tissue level. Existing methods such as the standard curve method fail to account for stochastic effects occuring at small molecule numbers and do not provide error bars for the copy number estimates. This project consisted of producing an algorithm written in C to perform Bayesian inference on single-cell qPCR data. The inference algorithm provides both an estimate on the initial copy number X0 and an error bar for this estimate. This error bar is essential in experimental interpretation, as it captures our degree of belief in the copy number estimate.

The project combines existing research from different sources and applies the resulting algorithm to both real single-cell data and in-silico simulations:

1. It uses the work of Lalam, 2007 to model the qPCR amplification as a Hidden Markov Model based on a binomial branching process.

2. It adopts an inference approach proposed by Wilkinson, 2011 for latent processes (such as the HMM), namely pseudo-marginal Metropolis Hastings. The particularities of inference on a HMM are not discussed by Lalam.

3. It fixes the parameter trade-off problem between copy number X0 and efficiency r causing underdetermination in the copy number estimate. The solution consists of using parameter pooling to share efficiency r and noise σ2 between the inference on 3 experiments. Neither the problem nor the approach have been discussed in existing literature on qPCR quantification.

4. It looks at methods for computing the fluorescence coefficient α, not discussed by Lalam or in other work on Bayesian inference, where it is simply assumed to be a fixed known constant. Both Bayesian inference and curve fitting were carefully considered by taking into account the level of stochasticity in the standard data. Non-linear curve fitting was used to fit a sigmoid curve to the fluorescence data, using a Python package.

5. It prepares the observation data by extracting the exponential phase from the amplification curve using model selection and AIC (using Python). This step is essential, or else the assumption of constant efficiency made by the model is violated.

Results showed promising parameter posteriors, with narrow variance and means close to the estimations from the standard curve method, despite the chains not being fully mixed (only 100,000 iterations were performed). Further work is needed to improve mixing, making the current posteriors shown in results merely indicative.

Future work

The main areas for further work are highlighted below:

1. Incorporating experimental knowledge in the prior parameter distributions.

2. Modelling the error in pipetting during the protocol step of halving cell contents and placing them into two wells when measuring heteroplasmy. This can be done by replacing the initial copy number X0 state in the Hidden Markov Model with two states, X0 and X0half, linked by a transition Binom(X0, 0.5) to capture the error in the pipetting procedure. X0half will be an additional hidden state of the model.

3. Investigating mixing and obtaining a fully mixed chain, as fluctuations at the macroscopic level in the trace plots of parameters X0 and r are visible with current results.

4. Investigating whether the assumption of constant efficiency made for parameter pooling is valid.

5. Considering a hierarchical model for the fluorescence coefficient α, using the distribution obtained from sigmoid curve fitting, in order to take into account its variance. Currently, the mean estimate of α is used as a fixed constant in inference.