Chunk #57 — Methods — Singleton clustering analysis — Data

Source: Sequencing of 53,831 diverse genomes from the NHLBI TOPMed Program.
Embedded: yes

Text

From the TOPMed freeze 5 dataset, we selected a subset of 1,000 unrelated individuals of African ancestry, 1,000 unrelated individuals of East Asian ancestry and 1,000 unrelated individuals of European ancestry, with the ancestry of each individual inferred across 7 global reference populations using RFMix93. In each of these subsamples, we recalculated the allele counts of each SNV and extracted SNVs that were singletons within that sample, then calculated the distance to the nearest singleton (either upstream or downstream from the focal singleton) occurring within the same individual. Note that a singleton defined here is not necessarily a singleton in the entire TOPMed freeze 5 dataset. We chose to limit the size of each population subsample to n = 1,000 for three reasons: first, to ensure the different population subsamples carried roughly a similar number of singletons; second, to ensure homogeneous ancestry within each subsample so that our analysis of singleton clustering patterns was not an artefact of admixed haplotypes; third, to limit the incidence of recurrent mutations at hypermutable sites, which can alter the underlying mutational spectrum of singleton