paperKB
coga / coga-kb
Help
Sign in

Chunk #26 — Experimental design — Setting tag family size by PCR amplification

Source
Detecting ultralow-frequency mutations by Duplex Sequencing.
Embedded
yes

Text

genome target increases. The following formula can be used to estimate the number of reads to devote to a sample: N=40DGR Where N is the number of paired-end reads devoted to a sample (i.e., lane fraction), D is the desired average depth of coverage, G is the genome or genome target size in base pairs and R is the postanalysis read length in bases (76 in our analysis). The value of 40 is an empirically determined pseudo-constant that approximately corresponds to the average number of unprocessed read-pairs needed to form a single DCS. This value roughly corresponds to the product of the optimal peak family size (i.e., six) and the number of SSCSs typically needed to form a single DCS (we typically obtain SSCS:DCS ratios between four and ten, with an average of around six). However, because not all raw reads are of high quality, several extra reads are typically present that fail to form a DCS, and this tends to raise the number of raw reads needed to form a single DCS to ~40. Notably, this formula provides a rough estimate of the depth that will be obtained provided that the peak family size and SSCS:DCS ratio are optimal.