Chunk #43 — Methods — Sample sets

Source: Sequencing of 53,831 diverse genomes from the NHLBI TOPMed Program.
Embedded: yes

Text

Several sample sets derived from three different WGS data freezes were used in the analyses presented here: freeze 3 (GRCh37 alignment, around 18,000 samples jointly called in 2016), freeze 5 (GRCh38 alignment, approximately 65,000 samples jointly called in 2017), and freeze 8 (GRCh38 alignment, about 140,000 samples jointly called in 2019). Extended Data Table 3 indicates which TOPMed study-consent groups were used in each of several different types of analyses described in this paper. Most analyses were performed on a set of 53,831 samples derived from freeze 5 (‘General variant analyses’ in Extended Data Table 3) or on a subset thereof approved for population genetic studies (‘Population genetics’ in Extended Data Table 3). The set of 53,831 was selected from freeze 5 using samples eligible for dbGaP sharing at the time of analysis, excluding (1) duplicate samples from the same participant; (2) one member of each monozygotic twin pair; (3) samples with questionable identity or low read depth (<98% of variant sites at depth ≥ 10×); and (4) samples with consent types inconsistent with analyses presented here. The ‘unrelated’ sample