Chunk #33 — Methods — GAWMerge development — Imputation strategy

Source: GAWMerge expands GWAS sample size and diversity by combining array-based genotyping and whole-genome sequencing.
Embedded: yes

Text

The merged array and WGS data were first imputed using Minimac419 using the thousand genomes phase 3 version 5 EUR and AFR super populations for EA and AA samples, respectively. The reference panel includes 503 EUR and 661 AFR samples with data on GRCh37 genome version. TOPMed WGS data was converted from genome version GRCh38 to GRCh37 to match the reference and array-genotyped data. Besides applying the standard imputation quality measurement R2, we also observed poorly imputed variants indicated by Empirical R2 (ER2). ER2 was defined only for genotyped variants as the squared correlation between leave-one-out imputed dosages and the true, observed genotypes. Under our first test for controlling type-I error (Fig. 2b), array data from COPDGene EA1 (N = 3251) and WGS data from ECLIPSE EA (N = 1461), we expected no genome-wide significant associations since all individuals were smokers and no disease was being tested between the datasets. Without the ER2 filter, we found many false positives (Supplementary Fig. 1a) based around the variant on chromosome 10 (chr10:32370743, ER2 = 0.391, MAF = 0.068). We recommend removing such