The effect of genome-wide association scan quality control on imputation outcome for common variants.
- Authors
- Southam, Lorraine; Panoutsopoulou, Kalliope; Rayner, N William; Chapman, Kay; Durrant, Caroline; Ferreira, Teresa; Arden, Nigel; Carr, Andrew; Deloukas, Panos; Doherty, Michael; Loughlin, John; McCaskie, Andrew; Ollier, William E R; Ralston, Stuart; Spector, Timothy D; Valdes, Ana M; Wallis, Gillian A; Wilkinson, J Mark; arcOGEN consortium; Marchini, Jonathan; Zeggini, Eleftheria
- Year
- 2011
- Journal
- European journal of human genetics : EJHG
- PMID
- 21267008
- DOI
- 10.1038/ejhg.2010.242
- PMCID
- PMC3083623
Imputation is an extremely valuable tool in conducting and synthesising genome-wide association studies (GWASs). Directly typed SNP quality control (QC) is thought to affect imputation quality. It is, therefore, common practise to use quality-controlled (QCed) data as an input for imputing genotypes. This study aims to determine the effect of commonly applied QC steps on imputation outcomes. We performed several iterations of imputing SNPs across chromosome 22 in a dataset consisting of 3177 samples with Illumina 610 k (Illumina, San Diego, CA, USA) GWAS data, applying different QC steps each time. The imputed genotypes were compared with the directly typed genotypes. In addition, we investigated the correlation between alternatively QCed data. We also applied a series of post-imputation QC steps balancing elimination of poorly imputed SNPs and information loss. We found that the difference between the unQCed data and the fully QCed data on imputation outcome was minimal. Our study shows that imputation of common variants is generally very accurate and robust to GWAS QC, which is not a major factor affecting imputation outcome. A minority of common-frequency SNPs with particular properties cannot be accurately imputed regardless of QC stringency. These findings may not generalise to the imputation of low frequency and rare variants.
(a) Imputation results for the QCed data indicating the total number of SNPs filtered for different QC thresholds using the IMPUTE-info and freq-add-proper-info scores. The SNPs remaining after the filter (red bar) have been subdivided into SNPs that are significant (green bar) and not significant (yellow bar). (b) The same data as percentage of significant and nonsignificant SNPs removed for each threshold. Both methods of filtering appear to be equivalent, but the freq-add-proper-info is shifted to the right for the same numerical threshold; we chose the IMPUTE-info <0.8 for further analysis (similar to a freq-add-proper-info <0.9).
Correlation plots and the associated R2 for (a) The unQCed and the QCed with and without post-imputation QC filtering (IMPUTE-info <0.8 and MAF <5%). (b) The imputed-only markers in the unQCed and fully QCed data (QCed data with all poorly clustered markers removed) without post-imputation QC filtering.
No entities extracted from this document yet.
No uploaded files.
| Citation | PMID | DOI | Status |
|---|---|---|---|
| BarretJCClaytonDGConcannonPGenome-wide association study and meta-analysis find that over 40 loci affect risk of type 1 diabetesNat Genet2009417037071943048010.1038/ng.381PMC2889014 | — | — | — |
| BarrettJCCardonLREvaluating coverage of genome-wide association studiesNat Genet2006386596621671509910.1038/ng1801 | — | — | — |
| FrankeABalschunTKarlsenTHReplication of signals from recent studies of Crohn's disease identifies previously unknown disease loci for ulcerative colitisNat Genet2008407137151843840510.1038/ng.148 | — | — | — |
| MarchiniJHowieBGenotype imputation for genome-wide association studiesNat Rev Genet2010114995112051734210.1038/nrg2796 | — | — | — |
| MarchiniJHowieBMyersSA new multipoint method for genome-wide association studies via imputation of genotypesNat Genet2007399069131757267310.1038/ng2088 | — | — | — |
| NothnagelMEllinghausDSchreiberSA comprehensive evaluation of SNP genotype imputationHum Genet20091251631711908945310.1007/s00439-008-0606-5 | — | — | — |
| ProkopenkoILangenbergCFlorezJCVariants in MTNR1B influence fasting glucose levelsNat Genet20094177811906090710.1038/ng.290PMC2682768 | — | — | — |
| SoranzoNSpectorTDManginoMA genome-wide meta-analysis identifies 22 loci associated with eight hematological parameters in the HaemGen consortiumNat Genet200941118211901982069710.1038/ng.467PMC3108459 | — | — | — |
| The International HapMap ConsortiumThe International HapMap ProjectNature20034267897961468522710.1038/nature02168 | — | — | — |
| The Wellcome Trust Case Control ConsortiumGenome-wide association study of 14 000 cases of seven common diseases and 3000 shared controlsNature20074476616781755430010.1038/nature05911PMC2719288 | — | — | — |
| ZegginiEScottLJSaxenaRMeta-analysis of genome-wide association data and large-scale replication identifies additional susceptibility loci for type 2 diabetesNat Genet2008406386451837290310.1038/ng.120PMC2672416 | — | — | — |
In this knowledge base
| Title | Year | PMID |
|---|---|---|
| Imputation across genotyping arrays for genome-wide association studies: assessment of bias and a correction strategy. | 2013 | 23334152 |
External
| Title | Authors | Journal | Year | Link |
|---|---|---|---|---|
| Antecedent Flu-Like Illness and Onset of Idiopathic Dilated Cardiomyopathy: The DCM Precision Medicine Study. | Ni H et al. | — | 2025 | → |
| Cell-free DNA as a potential alternative to genomic DNA in genetic studies. | Zeng J et al. | — | 2025 | → |
| Establishing Best Practices for Clinical GWAS: Tackling Imputation and Data Quality Challenges. | Casaburi G et al. | — | 2025 | → |
| Genetic variants linked to statin-associated Type 2 diabetes mellitus: Findings from the UK Biobank and the All of Us Research Program. | Park YA et al. | — | 2025 | → |
| Shared genes and pathways in dementia: Insights from genome-wide association studies. | Loi KJ et al. | — | 2025 | → |
| Impact of pre- and post-variant filtration strategies on imputation. | Charon C et al. | — | 2021 | → |
| Genotype Imputation in Genome-Wide Association Studies. | Naj AC | — | 2019 | → |
| From genome-wide associations to candidate causal variants by statistical fine-mapping. | Schaid DJ et al. | — | 2018 | → |
| Pathway-Wide Genetic Risks in Chlamydial Infections Overlap between Tissue Tropisms: A Genome-Wide Association Scan. | Roberts CH et al. | — | 2018 | → |
| Validation of genotype imputation in Southeast Asian populations and the effect of single nucleotide polymorphism annotation on imputation outcome. | Lert-Itthiporn W et al. | — | 2018 | → |
| Extending the use of GWAS data by combining data from different genetic platforms. | van Iperen EP et al. | — | 2017 | → |
| Pathways-wide genetic risks in Chlamydial infections overlap between tissue tropisms: A genome-wide association scan | Roberts Ch et al. | — | 2017 | — |
| PROX1 gene CC genotype as a major determinant of early onset of type 2 diabetes in slavic study participants from Action in Diabetes and Vascular Disease: Preterax and Diamicron MR Controlled Evaluation study. | Hamet P et al. | — | 2017 | → |
| Testing Departure from Hardy-Weinberg Proportions. | Wang J et al. | — | 2017 | → |
| Conjunctival fibrosis and the innate barriers to Chlamydia trachomatis intracellular infection: a genome wide association study. | Roberts Ch et al. | — | 2015 | → |
| Genome-wide association study identifies a new susceptibility locus for cleft lip with or without a cleft palate. | Sun Y et al. | — | 2015 | → |
| Common genetic variants do not associate with CAD in familial hypercholesterolemia. | van Iperen EP et al. | — | 2014 | → |
| Impact of pre-imputation SNP-filtering on genotype imputation results. | Roshyara NR et al. | — | 2014 | → |
| Imputation and quality control steps for combining multiple genome-wide datasets. | Verma SS et al. | — | 2014 | → |
| mtDNA haplogroups and osteoarthritis in different geographic populations. | Soto-Hermida A et al. | — | 2014 | → |
| No association between CTNNBL1 and episodic memory performance. | Liu T et al. | — | 2014 | → |
| Predicting HLA genotypes using unphased and flanking single-nucleotide polymorphisms in Han Chinese population. | Hsieh AR et al. | — | 2014 | → |
| Imputation across genotyping arrays for genome-wide association studies: assessment of bias and a correction strategy. | Johnson EO et al. | — | 2013 | → |
| Assessment of genotype imputation performance using 1000 Genomes in African American studies. | Hancock DB et al. | — | 2012 | → |
| Association of FTO gene variants with body composition in UK twins. | Livshits G et al. | — | 2012 | → |
| Methods for meta-analyses of genome-wide association studies: critical assessment of empirical evidence. | Gögele M et al. | — | 2012 | → |