Chunk #14 — INTRODUCTION — Overview of the procedure

Source: Detecting ultralow-frequency mutations by Duplex Sequencing.
Embedded: yes

Text

Next, the two related SSCS reads corresponding to the two initial DNA strands are grouped together and compared by a script called ‘DuplexMaker.py’. The 24-nt tag associated with each sequence read consists of two 12-nt sequences, and the tags corresponding to a pair of SSCS reads can be grouped by virtue of being transposed relative to one another. Specifically, if the two 12-nt sub-tag sequences are designated α and β, then a sequence with a tag αβ in read 1 is compared with the sequence having the tag βα in read 2. The paired strand SSCS reads are then compared at each position, with only matching bases being kept to produce a DCS. Non-matching bases are considered undefined, and they are replaced by ‘N’. Reads containing a high proportion of Ns (>30%) are filtered out during this step (Supplementary Fig. 4). The processed DCS data are then re-aligned to the reference genome.