paperKB
coga / coga-kb
Help
Sign in

Chunk #5 — The ExAC Data set

Source
Analysis of protein-coding genetic variation in 60,706 humans.
Embedded
yes

Text

The density of variation in ExAC is not uniform across the genome, and the observation of variants depends on factors such as mutational properties and selective pressures. In the ~45M well covered (80% of individuals with a minimum of 10X coverage) positions in ExAC, there are ~18M possible synonymous variants, of which we observe 1.4M (7.5%). However, we observe 63.1% of possible CpG transitions (C to T variants, where the adjacent base is G), while only observing 3% of possible transversions and 9.2% of other possible transitions (Supplementary Information Table 9). A similar pattern is observed for missense and nonsense variants, with lower proportions due to selective pressures (Figure 1D). Of 123,629 HQ insertion/deletions (indels) called in coding exons, 117,242 (95%) have length <6 bases, with shorter deletions being the most common (Figure 1E). Frameshifts are found in smaller numbers and are more likely to be singletons than in-frame indels (Figure 1F), reflecting the influence of purifying selection.