paperKB
coga / coga-kb
Help
Sign in

Chunk #5 — Variant calling and Quality Control filtering of exome sequencing datasets

Source
Exome sequencing and the genetic basis of complex traits.
Embedded
yes

Text

Exome sequencing coverage has tremendous regional variation8. Some regions may be over-covered, representing true structural variation (e.g., segmental duplications for which only one copy of the region exists in the reference genome), or technical artifacts (e.g., greater abundance of capture probes, or overlapping probe definitions resulting in “double-capturing”). Similarly, some areas may be under-covered for biological reasons (e.g., segmental duplications where more than one copy exists in the reference sequence, preventing the aligner from placing the read uniquely) or for technical reasons (e.g., high GC content or density of variation, which impairs hybridization of probes). Furthermore, some “near-target” regions within 50 bp of the target boundary can have sufficient coverage to warrant inclusion in variant calling. Critically, whichever capture technology is used, either all samples should be processed using the same technology or the variability should be accounted for, e.g., by stratifying the study by technology (see section on population stratification).