Artifact due to differential error when cases and controls are imputed from different platforms.

paper Cited Public

Authors: Sinnott, Jennifer A; Kraft, Peter
Year: 2012
Journal: Human genetics
PMID: 21735171
DOI: 10.1007/s00439-011-1054-1
PMCID: PMC3217156

Fig. 1

Black and gray lines represent λ values and the percentages of p-values less than 5×10−8, respectively, for SNPs grouped by minor allele frequency (MAF) in four settings: a SNPs genotyped on both Affy and Illumina platforms; b SNPs genotyped on Illumina platform and imputed for the Affy controls; c SNPs genotyped on Affy platform and imputed for the Illumina controls; and d SNPs imputed for both groups. Solid lines are from soft call analysis and dashed lines are from hard call analysis. Note that in some places (particularly in panel a) the solid and dashed lines are indistinguishable because the results from the soft call and hard call analyses were very similar.

Fig. 2

Top 3 principal components (PCs), among SNPs genotyped in the Illumina controls and imputed using hard calls in the Affy controls, plotted against one another. Affy samples are plotted in black; Illumina samples are plotted in gray.

Fig. 3

Among SNPs genotyped in the Illumina controls and imputed using soft calls among the Affy controls, discrimination of the R2 criterion described in Method 2, as the R2-threshold varies. The y-axis is the sensitivity, the proportion of highly significant SNPs which are excluded; the x-axis is 1–specificity, the proportion of non-significant SNPs which are excluded. R2 threshold choices between 0.3 and 0.99 are pointed out along the curve.

Fig. 4

Among SNPs genotyped in the Illumina controls and imputed using soft calls among the Affy controls, discrimination of the preliminary screening criterion described in Method 3, as the α-threshold varies. The y-axis is the sensitivity, the proportion of highly significant SNPs which are excluded; the x-axis is 1– specificity, the proportion of non-significant SNPs which are excluded. Plots shown are for a n = 100, b n = 300, and c n = 500 additional controls. α threshold choices between 0.001 and 0.2 are pointed out along the curves.

#	Section	Preview
0	Introduction	A population-based genome-wide association (GWA) study requires thousands of cases and controls in…
1	Introduction	A complication in the reuse of control groups or the inclusion of external controls arises when…
2	Introduction	provides a large but less determinate collection of SNPs designed to give good coverage of the…
3	Introduction	When pooling genotype data from different platforms, investigators could impute the SNPs missing on…
4	Introduction	After imputation, investigators run association tests as usual, producing p-values for each SNP and…
5	Introduction	Differential error induced by imputation may yield SNPs that appear to differ substantially between…
6	Introduction	In this paper, we are concerned with problems occurring one step further down the pipeline. Under…
7	Introduction	When we did in fact observe inflated Type I error after applying standard imputation quality…
8	Methods	The BrCa and T2D studies have been described elsewhere (Hunter et al. 2007; Qi et al. 2010). Both…
9	Methods	available for that individual and takes values on a continuum between 0 and 2, and a hard call…
10	Methods	We ran a logistic regression for each of m SNPs, modeling the log-odds of being a “case” (Y = 1)…
11	Methods	log{P(Yi=1)1−P(Yi=1)}=β0+β1Ai where β1 is the effect of SNP i and β0 is an intercept term. We…
12	Methods	We grouped the SNPs into four categories: SNPs genotyped on both chips; SNPs genotyped on Affy and…
13	Methods	λ=mediani=1,…,m{Xi}/0.455 where 0.455 is approximately the theoretical median of a χ12…
14	Methods	When λ > 1 and the percentage of SNPs significant at the 5 × 10−8 level was more than expected…
15	Methods — Method 1	We investigated whether we could capture the platform effect using PCs. To do this, we used…
16	Methods — Method 2	When missing genotypes are imputed by MaCH, each SNP has an R2 value associated with it that…
17	Methods — Method 2	Focusing on SNPs measured on one chip and imputed in the other, we considered removing SNPs with…
18	Methods — Method 2	We also constructed an ROC curve to assess the discriminatory ability of this method. We labeled…
19	Methods — Method 3	The genotype distributions for some SNPs may differ markedly across platforms due to genotyping…

Citation	PMID	DOI	Status
Alberts, B, Science, 2010, Editorial expression of concern	21071647	10. 1126/science.330.6006.912-b	Cited
Beecham, GW et al., Ann Hum Genet, 2010, APOE is not associated with alzheimer disease: a cautionary tale of genotype imputation	20529013	10.1111/j.1469-1809.2010.00573.x	Cited
Carmichael, M, The little flaw in the longevity-gene study that could be a big problem, 2010	—	—	—
Devlin, B et al., Biometrics, 1999, Genomic control for association studies	11315092	10.1111/j.0006-341x.1999.00997.x	Cited
Fallin, MD et al., Neurogenetics, 2010, Fine mapping of the chromosome 10q11-q21 linkage region in Alzheimer’s disease cases and controls	20182759	10. 1007/s10048-010-0234-9	Cited
Ho, LA et al., Hum Genet, 2010, Using public control genotype data to increase power and decrease cost of case-control genetic association studies	20821337	10.1007/s00439-010-0880-x	Cited
Hom, G et al., N Engl J Med, 2008, Association of systemic lupus erythematosus with C8orf13-BLK and ITGAM-ITGAX	18204098	10.1056/NEJMoa0707865	Cited
Howie, BN et al., PLoS Genet, 2009, A flexible and accurate genotype imputation method for the next generation of genome-wide association studies	19543373	10.1371/journal.pgen.1000529	Cited
Hunter, DJ et al., Nat Genet, 2007, A genome-wide association study identifies alleles in FGFR2 associated with risk of sporadic postmenopausal breast cancer	17529973	10.1038/ng2075	Cited
Li, Y et al., Annu Rev Genomics Hum Genet, 2009, Genotype imputation	19715440	10.1146/annurev.genom.9.081307.164242	Cited
Li, Y et al., Genet Epidemiol, 2010, MaCH: using sequence and genotype data to estimate haplotypes and unobserved genotypes	21058334	10.1002/gepi.20533	Cited
Luca, D et al., Am J Hum Genet, 2008, On the use of general control samples for genome-wide association studies: genetic matching highlights causal variants	18252225	10.1016/j.ajhg. 2007.11.003	Cited
Marchini, J et al., Nat Rev Genet, 2010, Genotype imputation for genome-wide association studies	20517342	10.1038/nrg2796	Cited
McCarthy, MI et al., Nat Rev Genet, 2008, Genome-wide association studies for complex traits: consensus, uncertainty and challenges	18398418	10.1038/nrg2344	Cited
Moskvina, V et al., Hum Hered, 2006, Effects of differential genotyping error rate on the type I error probability of case-control studies	16612103	10.1159/000092553	Cited
Nature, 2007, Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls	17554300	10.1038/nature05911	Cited
Patterson, N et al., PLoS Genet, 2006, Population structure and eigenanalysis	17194218	10.1371/journal.pgen.0020190	Cited
Price, AL et al., Nat Genet, 2006, Principal components analysis corrects for stratification in genome-wide association studies	16862161	10.1038/ng1847	Cited
Purcell, S et al., Am J Hum Genet, 2007, PLINK: a tool set for whole-genome association and population-based linkage analyses	17701901	10.1086/519795	Cited
Qi, L et al., Hum Mol Genet, 2010, Genetic variants at 2q24 are associated with susceptibility to type 2 diabetes	20418489	10.1093/hmg/ddq156	Cited
R: A Language and Environment for Statistical Computing, 2009	—	—	—
Scott, LJ et al., Science, 2007, A genome-wide association study of type 2 diabetes in finns detects multiple susceptibility variants	17463248	10.1126/science.1142382	Cited
Sebastiani, P et al., Science, 2010, Genetic signatures of exceptional longevity in humans	20595579	10.1126/science.1190532	Cited
Wrensch, M et al., Nat Genet, 2009, Variants in the CDKN2B and RTEL1 regions are associated with high-grade glioma susceptibility	19578366	10.1038/ng.408	Cited
Zhuang, JJ et al., Genet Epidemiol, 2010, Optimizing the power of genome-wide association studies by using publicly available reference samples to expand the control group	20088020	10.1002/gepi.20482	Cited

In this knowledge base

Title	Year	PMID
Gene-environment interactions in cancer epidemiology: a National Cancer Institute Think Tank report.	2013	24123198
Imputation across genotyping arrays for genome-wide association studies: assessment of bias and a correction strategy.	2013	23334152

External

Title	Authors	Journal	Year	Link
Accurate cross-platform GWAS analysis via two-stage imputation	Greenberg A et al.	—	2024	—
Accuracy of haplotype estimation and whole genome imputation affects complex trait analyses in complex biobanks.	Appadurai V et al.	—	2023	→
Interaction Testing and Polygenic Risk Scoring to Estimate the Association of Common Genetic Variants With Treatment Resistance in Schizophrenia.	Pardiñas AF et al.	—	2022	→
Genome-wide association and Mendelian randomisation analysis provide insights into the pathogenesis of heart failure.	Shah S et al.	—	2020	→
Genetic associations with childhood brain growth, defined in two longitudinal cohorts.	Szekely E et al.	—	2018	→
Imputation from SNP chip to sequence: a case study in a Chinese indigenous chicken population.	Ye S et al.	—	2018	→
Polygenic risk score of shorter telomere length and risk of depression and anxiety in women.	Chang SC et al.	—	2018	→
A comprehensive survey of genetic variation in 20,691 subjects from four large cohorts.	Lindström S et al.	—	2017	→
Age at natural menopause genetic risk score in relation to age at natural menopause and primary open-angle glaucoma in a US-based sample.	Pasquale LR et al.	—	2017	→
Bayesian and frequentist analysis of an Austrian genome-wide association study of colorectal cancer and advanced adenomas.	Hofer P et al.	—	2017	→
Failure to replicate thrombomodulin genetic variant predictors of venous thromboembolism in African Americans.	Folsom AR et al.	—	2017	→
Multi-SNP Haplotype Analysis Methods for Association Analysis.	Stram DO	—	2017	→
A genome-wide association study identifies variants in KCNIP4 associated with ACE inhibitor-induced cough.	Mosley JD et al.	—	2016	→
A genome-wide association study confirms PNPLA3 and identifies TM6SF2 and MBOAT7 as risk loci for alcohol-related cirrhosis.	Buch S et al.	—	2015	→
Genome-wide association study of intracranial aneurysm identifies a new association on chromosome 7.	Foroud T et al.	—	2014	→
Comparison of the performance of two commercial genome-wide association study genotyping platforms in Han Chinese samples.	Jiang L et al.	—	2013	→
Gene-environment interactions in cancer epidemiology: a National Cancer Institute Think Tank report.	Hutter CM et al.	—	2013	→
Genome-wide association analyses in East Asians identify new susceptibility loci for colorectal cancer.	Jia WH et al.	—	2013	→
Genome-wide association study identifies genetic risk underlying primary rhegmatogenous retinal detachment.	Kirin M et al.	—	2013	→
Imputation across genotyping arrays for genome-wide association studies: assessment of bias and a correction strategy.	Johnson EO et al.	—	2013	→
Re-ranking sequencing variants in the post-GWAS era for accurate causal variant identification.	Faye LL et al.	—	2013	→
Comprehensive search for Alzheimer disease susceptibility loci in the APOE region.	Jun G et al.	—	2012	→
Genetic epidemiology with a capital E: where will we be in another 10 years?	Thomas DC	—	2012	→
Genome-wide association study of intracranial aneurysms confirms role of Anril and SOX17 in disease risk.	Foroud T et al.	—	2012	→