Chunk #36 — Methods — Simulating discovery of novel variants

Source: Exome sequencing and the genetic basis of complex traits.
Embedded: yes

Text

To calculate the discovery rate of novel variants for increasing numbers of samples, first all exome samples are arranged in a random order. Then, samples are analyzed sequentially, starting with the first sample, and the cumulative set of identified variants is computed. For every subsequent sample, a variant site is considered novel if that site has not been identified as variant in the cumulative set of preceding samples. The fold-increase over baseline (where the baseline for each class is the number of variants discovered in the first sample) is plotted in Figure 1. To avoid sampling bias, random resampling is performed and the overall mean is calculated. Nonsense, Missense, and Synonymous classes are based on RefSeq annotations. The Missense class is further divided into “Probably damaging”, “Possibly damaging”, and “Benign” subclasses according to PolyPhen-2 predictions46. The “Theoretical” line plots the expected number of segregating sites under a neutral model of evolution in a population of constant size41.