paperKB
coga / coga-kb
Help
Sign in

Chunk #10 — Variant calling and Quality Control filtering of exome sequencing datasets

Source
Exome sequencing and the genetic basis of complex traits.
Embedded
yes

Text

Statistics such as transition/transversion ratio (Ti/Tv) and the number of novel variants are useful as gross guides to the quality of the dataset and enable comparison of two sets of calls from the same dataset. However, precise expectations of these statistics are unknown because they depend on many factors, including uneven coverage, variability in DNA quality, or other sources of technical bias such as machine error. Therefore, interpreting small differences from expectation in these statistics is nontrivial. Genotyping validation provides an additional measure of callset quality, independent of the population-genetics statistics. Comparing genotyping data to sequencing data enables directly measuring callset quality by calculating the non-reference sensitivity (“NRS” — the rate at which non-reference sites in the genotyping data are recovered in the sequencing data) and non-reference discrepancy rate (“NRD” — the rate at which genotypes from sequencing and genotyping data differ). A genotyping assay should include sites at various allele frequencies, especially low frequencies (∼1%). When available, family data, particularly trios, can also be useful to assess callset quality.