Absolute quantification of mtDNA

Quantitative Polymerase Chain Reaction (qPCR)

Specific details of qPCR may be abstracted away from for the purposes of this project. However, the basic aspects and important terms such as fluorescence, primers, efficiency and amplification phases are necessary in understanding the mathematical model and are explained below.

Basics of qPCR

Quantitative polymerase chain reaction (qPCR, Heid et al., 1996) is a technique used to amplify a set of DNA segment copies. It is useful for measuring the initial molecule copy number when it is too small to be determined experimentally without amplification.

The molecules undergo a series of thermal cycles (Fig. 6). During every cycle, the following steps occur:

Heating causes the double DNA strand to be split into two separate strands.
Primers designed to target a specific DNA fragment are introduced and bind to the desired segments. Special fluorescent reporter probes also bind to the desired segment, but at the other end than the primer.
Polymerisation starts from the primer and goes along the DNA segment to "reconstruct" the complementary strand.
When polymerisation ends, it reaches a reporter probe, degrades it and causes a fluorescent reporter to be emitted, which in turns increases fluorescence.

Figure 6. The qPCR experiment amplifies a set of DNA fragments using primers designed to identify and bind to these fragments. The molecules are amplified over a set of thermal cycles and a fluorescence intensity is reported after each cycle.

A primer is a short segment of single-stranded complimentary DNA designed to bind to a specific DNA sequence. Different primers are used to amplify mutant mtDNA molecules and any mtDNA molecule. Similarly to a primer, a probe is a short segment of single-stranded DNA design to bind to a specific DNA sequence. When polymerisation ends, it releases a reporter that causes an increase in fluorescence.

The efficiency of a qPCR experiment dictates how many molecules are amplified during each cycle. Is dependent on both the primers and the DNA template, and is an important aspect when interpreting qPCR amplification data, as we discuss below.

The amplification curve

Fluorescence intensities recorded after each qPCR cycle form a sigmoidal amplification curve. The inference method uses a mathematical model with a set of parameters to represent the amplification curve. It infers the parameters that best fit the model to the amplification data.

At the start of the experiment, the molecules amplify at peak efficiency (close to doubling at each cycle), showing an exponential increase in fluorescence (exponential phase). Then, the reaction starts to slow down and the efficiency decreases (linear phase). Finally, the reaction stops and no molecules are amplified, causing the fluorescence intensity to be close to constant between consecutive cycles (plateau phase). Therefore, the amplification curve can be divided into three phases: exponential, linear and plateau (Fig. 7).

Figure 7.The amplification curve consists of three phases: exponential (peak, constant efficiency), linear (decrease in efficiency) and plateau (efficiency 0, reaction stops). Only the information from the exponential phase, when efficiency is constant, is used for the inference.

Experimental protocol

The experimental protocol for measuring heteroplasmy in a single cell consists of quantifying the mutant and total mtDNA load separately using qPCR. The qPCR amplification curves of the two experiments are analysed separately and used to infer the initial quantities. Heteroplasmy is the ratio between mutant and total mtDNA quantities.

A qPCR experiment can only quantify one type of molecule, as it uses a primer to identify a specific sequence of the target molecule. The two molecule types of interest are the mutant and the general mtDNA molecule, with the latter including both mutant and wildtype molecules.

The experimental protocol aims to equally divide the contents of a single cell into two separate samples, each containing a half of the mutant load (m/2) and a half of the total load (N/2). One sample is used to infer the initial amount of mutant (m), while the other is used to infer the initial amount of total mtDNA (N). The assumption is that the initial solution is well mixed and after it is split, both samples contain half of the mutant and half of the total number of mtDNA molecules.

For each of the two solutions, a qPCR primer is used to identify and amplify either a DNA segment specific to mutant mtDNA, or a segment present in any mtDNA molecule.

A qPCR machine consists of a plate with 384 wells, allowing a maximum of 384 samples to be analysed in parallel. When the contents of a single cell are split into two halves, the resulting samples are placed on two wells of the same qPCR plate. The fluorescence intensity reads from the qPCR experiment consist of two amplification curves. We use this data to estimate and model the measurement error in the initial number of molecules m/2 and N/2 (Fig. 8).

Figure 8. Heteroplasmy is quantified by mixing cell contents, splitting in two halves and placing them onto separate qPCR wells. A qPCR experiment is used to amplify a sequence specific to the mutation in one well, while the other is used to amplify the a sequence specific to any mtDNA molecule.

The standard curve method

The standard curve method is the current go-to method in absolute quantification of target analytes from qPCR data. It is easy to use, but it oversimplifies the qPCR process, doesn't account for stochasticity and fails to provide an error model for the estimate. We first discuss the standard curve method before motivating the efforts of using Bayesian inference instead.

The standard curve method is a technique for building a calibration curve from samples with known properties. In the case of qPCR, the standard curve method is called comparative C_T (Schmittgen and Livak, 2008). It uses a set of samples (standards) where the initial number of molecules X₀ is known. For each standard, the cycle i where the fluorescence F_i exceeds a fixed threshold T is recorded. A calibration curve is constructed from the (X₀^j, C_T^j) for each standard j.

Figure 9 shows an example calibration curve constructed from real experimental data. The standard is a dilution series where the initial copy number X₀ is progressively diluted in factors of 10, from 1,000,000 to 100. The calibration curve is linear when plotting the logarithm of the copy number, under the assumption that efficiency is constant.

The calibration curve is used to estimate the unknown initial number of molecules X₀ for new samples by using the observed C_T value (Fig. 9).

Figure 9. The C_T calibration curve built from standards with known initial copy number is used to estimate the inital copy numbers for new samples using the C_T value, the fractional cycle at which fluorescence exceeds a threshold T.

Disadvantages

One of the main disadvantages of the standard curve method is that it assumes the amplification efficiency between the standards used to construct the curve and the analysed samples is the same.

Another disadvantage is that the standard curve method does not model the qPCR amplification as a stochastic process. For small numbers of initial molecules (X₀~100) and efficiency smaller than 1, which is the case for single-cell mtDNA, the amplification process shows stochasticity which is not captured by the standard curve method (Fig. 10). For a stochastic process, a sample with the same X₀ may yield different C_T values under different runs. Therefore, both a stochastic model and more information than just a point on the amplification curve are needed for an accurate estimate of the copy number X₀.

Figure 10. For small values of initial copy numbers, qPCR amplification is a stochastic process. The high variability in amplified molecules at each cycle is demonstrated by simulating 1,000 experiments in-silico with X₀ = 50, efficiency r = 0.75 and showing the 5-95% confidence interval at each cycle.

Single-experiment inference

The standard curve method discussed above uses a single point on the amplification curve to estimate the inital target quantity X₀. It does not provide an error model and fails to account for the inherent stochasticity of the qPCR process. This projects extends the work of Lalam, 2007 and uses Bayesian inference on the data from the entire amplification curve to determine X₀. Most importantly, it provides error bars that capture our belief in the copy number estimate, which is essential for interpreting experimental results.

Single-experiment inference uses as data the fluorescence intensity reads from the amplification curve of a sample with unknown initial copy number X₀. By looking at the entire amplification curve rather than just the C_T value, it can infer the efficiency of a sample without assuming it is the same as that of a standard.

The model and inference method

A stochastic model to capture the amplification process. This mathematical model is a Hidden Markov Model parameterised by the efficiency r, the initial number of molecules X₀ and the noise variance in fluorescence reads σ². More details and a motivation for the choice of model can be found in the Model section.

The parameters of model are infered using the pseudo-marginal Metropolis Hastings algorithm. The algorithm uses observation data (fluorescence intensity reads) to sample from the joint posterior distribution of the model parameters. Most importantly, one of the parameters of the model is the initial copy number X₀, the quantity of interest. Knowing the shape of the posterior distribution for X₀ gives both an estimate of the initial copy number and an error bar on this estimation (Fig. 11). More details about what a posterior distribution is and inference in general can be found in the Inference section.

Figure 11. The inference algorithm produces samples from the posterior distribution of parameters of the model given the fluorescence data. Most importantly, it produces samples of the initial copy number X₀. These samples put an error bar on the copy number estimate and capture our belief in experimental results.