Chunk #0 — Data generation and variant discovery

Source: Evolution and functional impact of rare coding variation from deep sequencing of human exomes.
Embedded: yes

Text

A total of 63.4 terabases of DNA sequence was generated at two centers with three complementary definitions of the exome target and two different capture technologies (18). We sequenced samples from 15 different cohorts in the ESP to an average median depth of 111× (range of 23× to 474×). We found no evidence of cohort- and/or phenotype-specific effects, or other systematic biases, in the analysis of the filtered single-nucleotide variant (SNV) data (figs. S1 to S7). Exomes from related individuals were excluded from further analysis (fig. S8), resulting in a data set of 2440 exomes. We inferred genetic ancestry by using a clustering approach (18) and, unless otherwise noted, focused the remaining analyses on the inferred 1351 European-American (EA) and 1088 African-American (AA) individuals. We subjected the 563,698 variants in the intersection of all three capture targets to standard quality-control filters (18), resulting in a final data set of 503,481 SNVs identified in 15,585 genes and 22.38 Mb of targeted sequence per individual. We assessed data quality and error rates by several orthogonal methods (18). About 98% (941/961) of all