paperKB
coga / coga-kb
Processing
Help
Sign in

Chunk #8 — Variant calling and Quality Control filtering of exome sequencing datasets

Source
Exome sequencing and the genetic basis of complex traits.
Embedded
yes

Text

We compared statistics computed in the 438-sample data set and in 37 whole-genome data sets released by Complete Genomics Inc. (CGI, see URLs), focusing only on the same genomic regions as the exome data. The CGI whole-genome dataset serves as a good comparison because whole-genome sequencing is not dependent on exome-capture technology. We further stratified these per-sample statistics into classes that are biologically interesting (functional class and CpG status) but may also exhibit different rates of technical artifacts. Table 1 shows that filtering is critical for achieving high-quality calls. Before filtering, the metrics show significant deviations from their expected values, which may indicate a high false-positive rate. After filtering, the statistics converge to those in the CGI dataset. The effectiveness of the filters is evident also from the comparison with human-chimpanzee divergence55.