Chunk #3 — The ExAC Data set

Source: Analysis of protein-coding genetic variation in 60,706 humans.
Embedded: yes

Text

Sequencing data processing, variant calling, quality control and filtering was performed on over 91,000 exomes (see Online Methods), and sample filtering was performed to produce a final data set spanning 60,706 individuals (Figure 1a). To identify the ancestry of each ExAC individual, we performed principal component analysis (PCA) to distinguish the major axes of geographic ancestry and to identify population clusters corresponding to individuals of European, African, South Asian, East Asian, and admixed American (hereafter Latino) ancestry (Figure 1b; Supplementary Information Table 3); we note that the apparent separation between East Asian and other samples reflects a deficiency of Middle Eastern and Central Asian samples in the data set. We further separated Europeans into individuals of Finnish and non-Finnish ancestry given the enrichment of this bottlenecked population; the term “European” hereafter refers to non-Finnish European individuals.