Chunk #18 — Introduction — Detection of copy number variants

Source: Detection of structural DNA variation from next generation sequencing data: a review of informatic approaches.
Embedded: yes

Text

An additional category of CNV detection algorithms, designed to detect sporadic CNVs from population exome sequence data, uses principal components of the matrix of read counts, over samples and exons, to normalize the read count data (45,46). In the absence of recurrent CNVs, the top principal components, which explain the bulk of the exome-wide variance in DOC, should represent experimental noise, including batch effects and GC-bias. Thus, removing the top principal components (i.e., projecting the data onto the space defined by the remaining components) should eliminate these biases. It should be emphasized, however, that such methods are intended for detecting sporadic CNVs; recurrent CNVs tend to be picked up by the top principal components, and their signals are therefore lost in the process of normalization.