How to deal with the early GWAS data when imputing and combining different arrays is necessary.
- Authors
- Uh, Hae-Won; Deelen, Joris; Beekman, Marian; Helmer, Quinta; Rivadeneira, Fernando; Hottenga, Jouke-Jan; Boomsma, Dorret I; Hofman, Albert; Uitterlinden, André G; Slagboom, P E; Böhringer, Stefan; Houwing-Duistermaat, Jeanine J
- Year
- 2012
- Journal
- European journal of human genetics : EJHG
- PMID
- 22189269
- DOI
- 10.1038/ejhg.2011.231
- PMCID
- PMC3330212
Genotype imputation has become an essential tool in the analysis of genome-wide association scans. This technique allows investigators to test association at ungenotyped genetic markers, and to combine results across studies that rely on different genotyping platforms. In addition, imputation is used within long-running studies to reuse genotypes produced across generations of platforms. Typically, genotypes of controls are reused and cases are genotyped on more novel platforms yielding a case-control study that is not matched for genotyping platforms. In this study, we scrutinize such a situation and validate GWAS results by actually retyping top-ranking SNPs with the Sequenom MassArray platform. We discuss the needed quality controls (QCs). In doing so, we report a considerable discrepancy between the results from imputed and retyped data when applying recommended QCs from the literature. These discrepancies appear to be caused by extrapolating differences between arrays by the process of imputation. To avoid false positive results, we recommend that more stringent QCs should be applied. We also advocate reporting the imputation quality measure (R(T)(2)) for the post-imputation QCs in publications.
Study samples and arrays used. Affy500 stands for the first generation Affymetrix Gene Chip Human Mapping 500K Array, Illumina660 for Illumina Infinium HD Human660W-Quad BeadChips, and Illumina550 for Illumina Infinium II HumanHap 550K and HumanHap550-Duo BeadChips. Sib 2 and controls were all genotyped, and for Sib1 in addition to the overlapping genotyped 60K SNPs, the remaining 457K SNPs were imputed. After post-imputation QC, 451K SNPs were analyzed using ASP–control design.
Comparison of the pre- and the postanalysis imputation information measure. The x axis shows the preanalysis information measure (r2), and the y axis the post-analysis information measure (RT2). The blue points indicate the SNPs with no association (P-value >0.05); there is little effect of case–control status, and two measures agree. The red ones are the SNPs that show strong association (P-value <0.001), and the green ones are intermediate cases.
Quantile–quantile plots obtained from LLS GWAS analyses. The triangles indicate the SNPs at which the test statistic exceeds 30 (corresponding P-value <5 × 10−8). The 95% concentration bands (shaded gray) are included. (a) ASP–control design: combined data of imputed Affy500 (Sib 1), typed Illumina660 (Sib 2), and typed Illumina550 (control). Deviation form the dashed line indicates inflation of test statistics. (b) Case–control design: genotyped with Illumina660 (Sib 2) and Illumina550 (control). (c) ASP–control design: 60K overlap using combined typed data of Affy500 (Sib 1), Illumina660 (Sib 2), and Illumina550 (control). (d) ASP–control design: as in (a), but only SNPs with RT2>0.98. Details are provided in Table 1.
Comparison of the MAF between GWAS and replication data. Top: x axis shows MAF of imputed Sib 1 data using Affy500, and y axis MAF of the same SNPs replicated with Sequenom. The green colored did not pass the threshold RT2>0.98. Bottom: x axis shows MAF of (genotyped) Sib 2 data using Illumina660, and y axis MAF of the same SNPs replicated with Sequenom. The red-filled circle in both panels indicates the same SNP.
No entities extracted from this document yet.
No uploaded files.
| Citation | PMID | DOI | Status |
|---|---|---|---|
| AndersonCAPetterssonFHClarkeGMCardonLRMorrisAPZondervanKTData quality control in genetic case-control association studiesNat Protoc20105156415732108512210.1038/nprot.2010.116PMC3025522 | — | — | — |
| ANZ genesGenome-wide association study identifies new multiple sclerosis susceptibility loci on chromosome 12 and 20Nat Genet2009418248281952595510.1038/ng.396 | — | — | — |
| CantorRMLangeKSinsheimerJSPrioritizing GWAS results: a review of statistical methods and recommendations for their approachAm J Hum Genet2010866222007450910.1016/j.ajhg.2009.11.017PMC2801749 | — | — | — |
| DeelenJBeekmanMUhHWGenome-wide association study identifies a single major locus contributing to survival into old age; the APOE locus revisitedAgeing Cell20111068669810.1111/j.1474-9726.2011.00705.xPMC319337221418511 | — | — | — |
| DevlinBRoederKGenomic control for association studiesBiometrics19995599710041131509210.1111/j.0006-341x.1999.00997.x | — | — | — |
| EllinorPTLunettaKLClazerNLCommon variants in KCNN3 are associated with lone atrial fibrillationNat Genet2010422402442017374710.1038/ng.537PMC2871387 | — | — | — |
| HofmanABretelerMMVan DuijnCMThe Rotterdam Study: 2010 objectives and design updateEur J Epidemiol2009245535721972811510.1007/s10654-009-9386-zPMC2744826 | — | — | — |
| HowieBNDonnellyPMarchiniJA flexible and accurate genotype imputation method for the next generation of genome-wide association studiesPLoS Genet20095e10005291954337310.1371/journal.pgen.1000529PMC2689936 | — | — | — |
| LiYAbecasisGMach 1.0: rapid haplotype reconstruction and missing genotype inferenceAm J Hum Genet2006S792290 | — | — | — |
| LiYWillerCSannaSAbecasisGGenotype imputationAnnu Rev Genomics Hum Genet2009103874061971544010.1146/annurev.genom.9.081307.164242PMC2925172 | — | — | — |
| MarchiniJHowieBGenotype imputation for genome-wide association studiesNat Rev Genet2010114995112051734210.1038/nrg2796 | — | — | — |
| MarchiniJHowieBMyersSMcVeanGDonnellyPA new multipoint method for genome-wide association studies via imputation of genotypesNat Genet2007399069131757267310.1038/ng2088 | — | — | — |
| StuartPENairRPEllinghausEGenome-wide association analysis identifies three psoriasis susceptibility lociNat Genet201042100010042095318910.1038/ng.693PMC2965799 | — | — | — |
| The Wellcome Trust Case Control ConsortiumGenome-wide association study of 14 000 cases of seven common diseases and 3000 shared controlsNature20074476616781755430010.1038/nature05911PMC2719288 | — | — | — |
| UhH-WHouwing-DuistermaatJJPutterHvan HouwelingenHCAssessment of global phase uncertainty in case-control studiesBMC Genet200910541975150510.1186/1471-2156-10-54PMC2760579 | — | — | — |
| UhHWWijkHJHouwing-DuistermaatJJTesting for genetic association taking into account phenotypic information of relativesBMC Proc20095(Suppl 7S1232001798910.1186/1753-6561-3-s7-s123PMC2795896 | — | — | — |
| WestendorpRGvan HeemstDRozingMPNonagenarian siblings and their offspring display lower risk for mortality and morbidity than sporadic nonagenarians: the Leiden Longevity StudyJ Am Geriatr Soc200959163416371968211710.1111/j.1532-5415.2009.02381.x | — | — | — |
| ZhongHYangXKaplanLMMolonyCSchadtEEIntegrating pathway analysis and genetics of gene expression for genome-wide association studiesAm J Hum Genet2010865815912034643710.1016/j.ajhg.2010.02.020PMC2850442 | — | — | — |
In this knowledge base
| Title | Year | PMID |
|---|---|---|
| Imputation across genotyping arrays for genome-wide association studies: assessment of bias and a correction strategy. | 2013 | 23334152 |
External
| Title | Authors | Journal | Year | Link |
|---|---|---|---|---|
| Enhancing Genomic Prediction Accuracy of Reproduction Traits in Rongchang Pigs Through Machine Learning. | Wang J et al. | — | 2025 | → |
| Accuracy of haplotype estimation and whole genome imputation affects complex trait analyses in complex biobanks. | Appadurai V et al. | — | 2023 | → |
| Accuracy of haplotype estimation and whole genome imputation affects complex trait analyses in complex biobanks | Appadurai V et al. | — | 2022 | — |
| Best practices for analyzing imputed genotypes from low-pass sequencing in dogs. | Buckley RM et al. | — | 2022 | → |
| Impact of pre- and post-variant filtration strategies on imputation. | Charon C et al. | — | 2021 | → |
| Unravelling the complex genetics of common kidney diseases: from variants to mechanisms. | Sullivan KM et al. | — | 2020 | → |
| Genome-wide association meta-analysis of cocaine dependence: Shared genetics with comorbid conditions. | Cabana-Domínguez J et al. | — | 2019 | → |
| Integrative network analysis highlights biological processes underlying GLP-1 stimulated insulin secretion: A DIRECT study. | Gudmundsdottir V et al. | — | 2018 | → |
| A comprehensive survey of genetic variation in 20,691 subjects from four large cohorts. | Lindström S et al. | — | 2017 | → |
| Extending the use of GWAS data by combining data from different genetic platforms. | van Iperen EP et al. | — | 2017 | → |
| Failure to replicate thrombomodulin genetic variant predictors of venous thromboembolism in African Americans. | Folsom AR et al. | — | 2017 | → |
| Securing the use of existing sample collections for future human genetic research. | Kanoungi G et al. | — | 2017 | → |
| Testing Departure from Hardy-Weinberg Proportions. | Wang J et al. | — | 2017 | → |
| HLA-C*01 is a Risk Factor for Crohn's Disease. | Jung ES et al. | — | 2016 | → |
| A Genome-Wide Association Study Identifies the Skin Color Genes IRF4, MC1R, ASIP, and BNC2 Influencing Facial Pigmented Spots. | Jacobs LC et al. | — | 2015 | → |
| Genotype-Based Score Test for Association Testing in Families. | Uh HW et al. | — | 2015 | → |
| Molgenis-impute: imputation pipeline in a box. | Kanterakis A et al. | — | 2015 | → |
| Genome-wide association meta-analysis of human longevity identifies a novel locus conferring survival beyond 90 years of age. | Deelen J et al. | — | 2014 | → |
| Impact of pre-imputation SNP-filtering on genotype imputation results. | Roshyara NR et al. | — | 2014 | → |
| Meta-analysis identifies loci affecting levels of the potential osteoarthritis biomarkers sCOMP and uCTX-II with genome wide significance. | Ramos YF et al. | — | 2014 | → |
| Genome-wide linkage analysis for human longevity: Genetics of Healthy Aging Study. | Beekman M et al. | — | 2013 | → |
| Genome-wide linkage scan in affected sibling pairs identifies novel susceptibility region for venous thromboembolism: Genetics In Familial Thrombosis study. | de Visser MC et al. | — | 2013 | → |
| Imputation across genotyping arrays for genome-wide association studies: assessment of bias and a correction strategy. | Johnson EO et al. | — | 2013 | → |
| Loci associated with N-glycosylation of human immunoglobulin G show pleiotropy with autoimmune diseases and haematological cancers. | Lauc G et al. | — | 2013 | → |