Chunk #14 — Materials and methods — Dataset merging

Source: A comprehensive survey of genetic variation in 20,691 subjects from four large cohorts.
Embedded: yes

Text

To identify any SNPs that created spurious associations, we ran several logistic regression analyses among subjects that were selected as controls in the initial GWAS (i.e. excluding all case subjects). For each regression, we used cohort-specific controls from one original GWAS as cases and the rest of the controls in that dataset as controls. For example, in the OmniExpress dataset, we considered NHS controls from the gout GWAS as “cases” while treating controls from the gout (HPFS), endometrial cancer (NHS), colon cancer (NHS, HPFS and PHS), and mammographic density (NHS) as “controls”. We repeated this, treating each cohort-specific “controls set” as “cases” and all other controls as “controls”. For each GWAS, we extracted genome-wide significant SNPs (p<10−8) and examined QQ plots. In the Affymetrix dataset, 100 SNPs were flagged and removed. In the HumanHap dataset, 8 SNPs had p<10−8 in at least one of the QC regressions and were removed. No SNPs in the OmniExpress dataset had p<10−8 and hence, no SNP was removed.