paperKB
coga / coga-kb
Processing
Help
Sign in

Chunk #23 — Experimental design — Setting tag family size by PCR amplification

Source
Detecting ultralow-frequency mutations by Duplex Sequencing.
Embedded
yes

Text

our preferred metric to refer to tag family sizes generated under a given set of conditions. This distribution occurs during PCR amplification, and it is the result of different amplification efficiencies of the DNA molecules present in the library. By plotting the proportion of reads belonging to tag families of the same size (e.g., tag families can have the same number of reads as different tags) as a function of tag family size (the ‘PE_reads. tagstats’ file generated in Step 60 of the PROCEDURE section provides these data), we typically observe a distribution of tag family sizes with a solitary peak at a tag family size of one, which is probably the result of sequencing errors in the tag region, and a broader distribution centered at the peak family size (see Fig. 4a as an example). We formally define peak family size as the tag family size >1 containing the highest proportion of reads. On the basis of the analysis of samples with different values for the peak family size, we have determined that a peak family size of six maximizes the efficiency of DS: that is, requiring the smallest number of raw reads to produce a single DCS (Fig.