Artifact due to differential error when cases and controls are imputed from different platforms.
- Authors
- Sinnott, Jennifer A; Kraft, Peter
- Year
- 2012
- Journal
- Human genetics
- PMID
- 21735171
- DOI
- 10.1007/s00439-011-1054-1
- PMCID
- PMC3217156
Including previously genotyped controls in a genome-wide association study can provide cost-savings, but can also create design biases. When cases and controls are genotyped on different platforms, the imputation needed to provide genome-wide coverage will introduce differential measurement error and may lead to false positives. We compared genotype frequencies of two healthy control groups from the Nurses' Health Study genotyped on different platforms [Affymetrix 6.0 (n = 1,672) and Illumina HumanHap550 (n = 1,038)]. Using standard imputation quality filters, we observed 9,841 single-nucleotide polymorphisms (SNPs) out of 2,347,809 (0.4%) significant at the 5 × 10(-8) level. We explored three methods for controlling for this Type I error inflation. One method was to remove platform effects using principal components; another was to restrict to SNPs of highest quality imputation; and a third was to genotype some controls alongside cases to exclude SNPs that are statistical artifact. The first method could not reduce the Type I error rate; the other two could dramatically reduce the error rate, although both required that a portion of SNPs be excluded from analysis. Ideally, the biases we describe would be eliminated at the design stage, by genotyping sufficient numbers of cases and controls on each platform. Researchers using imputation to combine samples genotyped on different platforms with severely unbalanced case-control ratios should be aware of the potential for inflated Type I error rates and apply appropriate quality filters. Every SNP found with genome-wide significance should be validated on another platform to verify that its significance is not an artifact of study design.
Black and gray lines represent λ values and the percentages of p-values less than 5×10−8, respectively, for SNPs grouped by minor allele frequency (MAF) in four settings: a SNPs genotyped on both Affy and Illumina platforms; b SNPs genotyped on Illumina platform and imputed for the Affy controls; c SNPs genotyped on Affy platform and imputed for the Illumina controls; and d SNPs imputed for both groups. Solid lines are from soft call analysis and dashed lines are from hard call analysis. Note that in some places (particularly in panel a) the solid and dashed lines are indistinguishable because the results from the soft call and hard call analyses were very similar.
Top 3 principal components (PCs), among SNPs genotyped in the Illumina controls and imputed using hard calls in the Affy controls, plotted against one another. Affy samples are plotted in black; Illumina samples are plotted in gray.
Among SNPs genotyped in the Illumina controls and imputed using soft calls among the Affy controls, discrimination of the R2 criterion described in Method 2, as the R2-threshold varies. The y-axis is the sensitivity, the proportion of highly significant SNPs which are excluded; the x-axis is 1–specificity, the proportion of non-significant SNPs which are excluded. R2 threshold choices between 0.3 and 0.99 are pointed out along the curve.
Among SNPs genotyped in the Illumina controls and imputed using soft calls among the Affy controls, discrimination of the preliminary screening criterion described in Method 3, as the α-threshold varies. The y-axis is the sensitivity, the proportion of highly significant SNPs which are excluded; the x-axis is 1– specificity, the proportion of non-significant SNPs which are excluded. Plots shown are for a n = 100, b n = 300, and c n = 500 additional controls. α threshold choices between 0.001 and 0.2 are pointed out along the curves.
No entities extracted from this document yet.
No uploaded files.
In this knowledge base
External
| Title | Authors | Journal | Year | Link |
|---|---|---|---|---|
| Accurate cross-platform GWAS analysis via two-stage imputation | Greenberg A et al. | — | 2024 | — |
| Accuracy of haplotype estimation and whole genome imputation affects complex trait analyses in complex biobanks. | Appadurai V et al. | — | 2023 | → |
| Interaction Testing and Polygenic Risk Scoring to Estimate the Association of Common Genetic Variants With Treatment Resistance in Schizophrenia. | Pardiñas AF et al. | — | 2022 | → |
| Genome-wide association and Mendelian randomisation analysis provide insights into the pathogenesis of heart failure. | Shah S et al. | — | 2020 | → |
| Genetic associations with childhood brain growth, defined in two longitudinal cohorts. | Szekely E et al. | — | 2018 | → |
| Imputation from SNP chip to sequence: a case study in a Chinese indigenous chicken population. | Ye S et al. | — | 2018 | → |
| Polygenic risk score of shorter telomere length and risk of depression and anxiety in women. | Chang SC et al. | — | 2018 | → |
| A comprehensive survey of genetic variation in 20,691 subjects from four large cohorts. | Lindström S et al. | — | 2017 | → |
| Age at natural menopause genetic risk score in relation to age at natural menopause and primary open-angle glaucoma in a US-based sample. | Pasquale LR et al. | — | 2017 | → |
| Bayesian and frequentist analysis of an Austrian genome-wide association study of colorectal cancer and advanced adenomas. | Hofer P et al. | — | 2017 | → |
| Failure to replicate thrombomodulin genetic variant predictors of venous thromboembolism in African Americans. | Folsom AR et al. | — | 2017 | → |
| Multi-SNP Haplotype Analysis Methods for Association Analysis. | Stram DO | — | 2017 | → |
| A genome-wide association study identifies variants in KCNIP4 associated with ACE inhibitor-induced cough. | Mosley JD et al. | — | 2016 | → |
| A genome-wide association study confirms PNPLA3 and identifies TM6SF2 and MBOAT7 as risk loci for alcohol-related cirrhosis. | Buch S et al. | — | 2015 | → |
| Genome-wide association study of intracranial aneurysm identifies a new association on chromosome 7. | Foroud T et al. | — | 2014 | → |
| Comparison of the performance of two commercial genome-wide association study genotyping platforms in Han Chinese samples. | Jiang L et al. | — | 2013 | → |
| Gene-environment interactions in cancer epidemiology: a National Cancer Institute Think Tank report. | Hutter CM et al. | — | 2013 | → |
| Genome-wide association analyses in East Asians identify new susceptibility loci for colorectal cancer. | Jia WH et al. | — | 2013 | → |
| Genome-wide association study identifies genetic risk underlying primary rhegmatogenous retinal detachment. | Kirin M et al. | — | 2013 | → |
| Imputation across genotyping arrays for genome-wide association studies: assessment of bias and a correction strategy. | Johnson EO et al. | — | 2013 | → |
| Re-ranking sequencing variants in the post-GWAS era for accurate causal variant identification. | Faye LL et al. | — | 2013 | → |
| Comprehensive search for Alzheimer disease susceptibility loci in the APOE region. | Jun G et al. | — | 2012 | → |
| Genetic epidemiology with a capital E: where will we be in another 10 years? | Thomas DC | — | 2012 | → |
| Genome-wide association study of intracranial aneurysms confirms role of Anril and SOX17 in disease risk. | Foroud T et al. | — | 2012 | → |