paperKB
coga / coga-kb
Help
Sign in

Chunk #7 — 410 million genetic variants in 53,831 samples

Source
Sequencing of 53,831 diverse genomes from the NHLBI TOPMed Program.
Embedded
yes

Text

Sequence analysis identified 410,323,831 genetic variants (381,343,078 SNVs and 28,980,753 indels), corresponding to an average of one variant per 7 bp (Extended Data Table 4). Overall, 78.7% of these variants had not been described in dbSNP build 149; TOPMed variants now account for the majority of variants in dbSNP. Among all variant alleles, 46.0% were singletons, observed once across all 53,831 participants. Among 40,722 unrelated participants (see Methods), the proportion of singleton variants was higher at 53.1% (Table 1). Downsampling analyses show that the proportion of singletons increases until around 15,000 unrelated individuals are sequenced and then decreases very gradually (Supplementary Fig. 11). The fraction of singletons in each region or class of sites closely tracks functional constraints. For example, among all 4,651,453 protein-coding variants in unrelated individuals, the proportion of singletons was the highest for the 104,704 frameshift variants (68.4%), high among the 97,217 putative splice and truncation variants (62.1%), intermediate among the 2,965,093 nonsynonymous variants (55.6%) and lowest among the 1,435,058 synonymous variants (49.8%). Beyond protein-coding sequences, we found increased proportions of singletons in promoters (55.0%), 5′ untranslated