Chunk #4 — 1 INTRODUCTION — 1.2 NGS data preprocessing for SNV detection

Source: SNVMix: predicting single nucleotide variants from next-generation sequencing of tumors.
Embedded: yes

Text

sequence is shown in blue. The first step involves transforming the aligned reads into allelic counts. This method assumes that the reads are correctly aligned and the nucleotide base calls are correct. Nucleotides that match the reference are shown in black, whereas nucleotides that do not match the reference are shown bolded in red. The figure illustrates how aligned data can be ‘collapsed’ into allelic counts. At each position i in the data, we can count the number of reads ai that match the reference genome and the number of reads bi that do not match the reference genome. In the case of rare third alleles, these reads are assumed to be errors. The total number of reads overlapping each position (called the depth) is given by Ni=ai+bi. In this context, given {ai, bi, Ni} for all i∈{1, 2,…, T} where T is the total number of positions in the genome, the task is to infer which positions exhibit an SNV.