paperKB
coga / coga-kb
Help
Sign in

Chunk #37 — Methods — Imputation and post-imputation quality filtering

Source
Use of >100,000 NHLBI Trans-Omics for Precision Medicine (TOPMed) Consortium whole genome sequences improves imputation quality and detection of rare variant associations in admixed African and Hispanic/Latino populations.
Embedded
yes

Text

We first phased individuals from each cohort separately using eagle [57] with default settings. We subsequently performed haplotype-based imputation using minimac4 [58] using phased haplotypes from TOPMed freeze 5b as reference. We used 100,506 TOPMed freeze 5b whole genome sequences as reference for all cohorts except JHS, for which we used 94,342 TOPMed freeze 5b non-JHS sequences. We additionally imputed HCHS/SOL and JHS using 1000 Genomes Phase 3 [9] and HRC [8] reference panels. Post-imputation quality filtering was performed using a R2 threshold specific to each MAF category to ensure average R2 for variants passing threshold was at least 0.8, following our previous work [4, 59]. Restricting to variants passing post-imputation quality control in at least two cohorts resulted in 34.4–35.8 million variants assessed in the AA cohorts and 26.7–27.2 million assessed in the HA cohorts, depending on the exact sample size of the tested trait. Imputation and association analysis included autosomal variants only. We assessed imputation quality (comparing true and estimated average R2) in three selected 3Mb regions: 16-19Mb region (relative to the start of each chromosome) from chromosomes 3, 12, and 20. Example scripts for imputation quality control are available at https://yunliweb.its.unc.edu/topmed5bimputation/index.php.