(68.4%), high among the 97,217 putative splice and truncation variants (62.1%), intermediate among the 2,965,093 nonsynonymous variants (55.6%) and lowest among the 1,435,058 synonymous variants (49.8%). Beyond protein-coding sequences, we found increased proportions of singletons in promoters (55.0%), 5′ untranslated regions (54.7%), regions of open chromatin (53.4%) and 3′ untranslated regions (53.3%); we found lower proportions of singletons in intergenic regions (53.0%) (Supplementary Table 5). Although putative transcription factor binding sites initially appeared to show fewer singletons (52.7%) than the remainder of the genome (53.1%), this pattern did not hold when we analysed highly mutable CpG sites separately. In fact, transcription factor binding sites were enriched for singletons in both CpG sites and non-CpG sites, an example of Simpson’s paradox16.Table 1Number of variants in 40,722 unrelated individuals in TOPMedAll unrelated individuals (n = 40,722)Per individual Total Singletons (%) Average 5th percentile Median95th percentileTotal variants384,127,954203,994,740 (53)3,748,5993,516,1663,563,9784,359,661SNVs357,043,141189,429,596 (53)3,553,4233,335,4423,380,4624,125,740Indels27,084,81314,565,144 (54)195,176180,616183,503233,928Novel variants298,373,330191,557,469 (64)29,20220,31224,10644,336SNVs275,141,134177,410,620 (64)25,02717,52020,97536,861Indels23,232,19614,146,849 (61)4,1752,7473,1457,359Coding variation4,651,4532,523,257 (54)23,90922,15822,55727,716Synonymous1,435,058715,254 (50)11,65110,84111,05613,678Nonsynonymous2,965,0931,648,672 (56)11,38410,63210,85613,221Stop/essential splice97,21760,347 (62)474425454566Frameshift104,70471,577 (68)132112127165In-frame51,99729,110 (56)1028599128Novel variants are taken as variants that were not present in dbSNP build 149, the most recent dbSNP version without TOPMed submissions.