Chunk #75 — Methods — Evolutionary genetics of individuals with diverse ancestry — Selection

Source: Sequencing of 53,831 diverse genomes from the NHLBI TOPMed Program.
Embedded: yes

Text

We started with 39,649 unrelated individuals selected from the TOPMed data freeze 5 for which we had consent for population genetic analyses (Extended Data Table 3). As the singleton density score (SDS) requires thousands of samples and a baseline demographic history, we subset our data by population group and limited our population analysis to those population groups for which we had well-studied demographic histories: broadly European, broadly African and broadly East Asian. To avoid potential problems introduced by admixture, we required that our samples had more than 90% inferred European, African or East Asian ancestry as inferred by a seven-way ancestry inference pipeline (Supplementary Information 1.11). This left n = 21,196 European samples, n = 2,117 African samples and n = 1,355 East Asian samples. We specifically excluded Amish samples from the European group as they are a unique founder population. We analysed each population separately. Only bi-allelic sites with an unambiguous ancestral state, inferred using the WGSA pipeline108, were used. Sites near chromosome boundaries, near centromeres and in regions with poor accessibility were excluded. We used the previously published