An empirical evaluation of imputation accuracy for association statistics reveals increased type-I error rates in genome-wide associations.
- Authors
- Almeida, Marcio A A; Oliveira, Paulo S L; Pereira, Tiago V; Krieger, José E; Pereira, Alexandre C
- Year
- 2011
- Journal
- BMC genetics
- PMID
- 21251252
- DOI
- 10.1186/1471-2156-12-10
- PMCID
- PMC3224203
BACKGROUND: Genome wide association studies (GWAS) are becoming the approach of choice to identify genetic determinants of complex phenotypes and common diseases. The astonishing amount of generated data and the use of distinct genotyping platforms with variable genomic coverage are still analytical challenges. Imputation algorithms combine directly genotyped markers information with haplotypic structure for the population of interest for the inference of a badly genotyped or missing marker and are considered a near zero cost approach to allow the comparison and combination of data generated in different studies. Several reports stated that imputed markers have an overall acceptable accuracy but no published report has performed a pair wise comparison of imputed and empiric association statistics of a complete set of GWAS markers. RESULTS: In this report we identified a total of 73 imputed markers that yielded a nominally statistically significant association at P < 10 -5 for type 2 Diabetes Mellitus and compared them with results obtained based on empirical allelic frequencies. Interestingly, despite their overall high correlation, association statistics based on imputed frequencies were discordant in 35 of the 73 (47%) associated markers, considerably inflating the type I error rate of imputed markers. We comprehensively tested several quality thresholds, the haplotypic structure underlying imputed markers and the use of flanking markers as predictors of inaccurate association statistics derived from imputed markers. CONCLUSIONS: Our results suggest that association statistics from imputed markers showing specific MAF (Minor Allele Frequencies) range, located in weak linkage disequilibrium blocks or strongly deviating from local patterns of association are prone to have inflated false positive association signals. The present study highlights the potential of imputation procedures and proposes simple procedures for selecting the best imputed markers for follow-up genotyping studies.
Efficiency of filtering criteria. Scatterplot comparing the minus-log corrected empiric and imputed p.values of the markers present in the complete dataset (A) and in the filtered one (B).
Summary plots. In Panel A: a graphical representation of the distribution of empiric association statistics throughout the human genome. In Panel B: same as A using the association statistics derived from imputed frequencies. In panel C: The distribution of the observed bias of association statistics of empiric and imputed frequencies.
Comparison of the predictive value of commonly used quality criteria for the observed bias between empiric and imputed allelic frequencies. The minus log bias is plotted in the y axis and the tested variables in x axis.
Comparison of different r2 summary statistics of the complete set of haplotypic blocks and their use as predictive variables for the observed bias between empiric and imputed frequencies.
Boxplot representation of the comparison of summary statistics of association values comparing sliding windows centered in concordant and discordant imputed markers.
Local patterns of association as predictor of accurate imputation. On the lower graphic, a graphical representation highlights markers that could be considered associated to the phenotype under study using a significance threshold of 10 -5. On the higher left and right panel, a highlighted representation of regions with concordant (right) and discordant (left) associations.
No entities extracted from this document yet.
No uploaded files.
| Citation | PMID | DOI | Status |
|---|---|---|---|
| AndersonCAEvaluating the effects of imputation on the power, coverage, and cost efficiency of genome-wide SNP platformsAm J Hum Genet2008831112910.1016/j.ajhg.2008.06.00818589396PMC2443836 | — | — | — |
| BaldingDJA tutorial on statistical methods for population association studiesNat Rev Genet20067107819110.1038/nrg191616983374 | — | — | — |
| BarrettJCCardonLREvaluating coverage of genome-wide association studiesNat Genet20063866596210.1038/ng180116715099 | — | — | — |
| de BakkerPIPractical aspects of imputation-driven meta-analysis of genome-wide association studiesHum Mol Genet200817R2R122810.1093/hmg/ddn28818852200PMC2782358 | — | — | — |
| Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controlsNature200744771456617810.1038/nature0591117554300PMC2719288 | — | — | — |
| GonzalezJRSNPassoc: an R package to perform whole genome association studiesBioinformatics2007235644510.1093/bioinformatics/btm02517267436 | — | — | — |
| MarchiniJA new multipoint method for genome-wide association studies by imputation of genotypesNat Genet20073979061310.1038/ng208817572673 | — | — | — |
| Newton-ChehCGenome-wide association study identifies eight loci associated with blood pressureNat Genet200910.1038/ng.361PMC289167319430483 | — | — | — |
| NothnagelMA comprehensive evaluation of SNP genotype imputationHum Genet200912521637110.1007/s00439-008-0606-519089453 | — | — | — |
| PeiYFAnalyses and comparison of accuracy of different genotype imputation methodsPLoS One2008310e355110.1371/journal.pone.000355118958166PMC2569208 | — | — | — |
| ServinBStephensMImputation-based analysis of association studies: candidate regions and quantitative traitsPLoS Genet200737e11410.1371/journal.pgen.003011417676998PMC1934390 | — | — | — |
| WolfsMGType 2 Diabetes Mellitus: New Genetic Insights will Lead to New TherapeuticsCurr Genomics2009102110810.2174/13892020978784702319794883PMC2699827 | — | — | — |
| YuZSchaidDJMethods to impute missing genotypes for population dataHum Genet2007122549550410.1007/s00439-007-0427-y17851696 | — | — | — |
| ZhaoZImputation of missing genotypes: an empirical evaluation of IMPUTEBMC Genet200898510.1186/1471-2156-9-8519077279PMC2636842 | — | — | — |
In this knowledge base
| Title | Year | PMID |
|---|---|---|
| Imputation across genotyping arrays for genome-wide association studies: assessment of bias and a correction strategy. | 2013 | 23334152 |
External
| Title | Authors | Journal | Year | Link |
|---|---|---|---|---|
| Comparing Methods to Select Candidates for Re-Genotyping to Impute Higher-Density Genotype Data in a Japanese Black Cattle Population: A Case Study. | Ogawa S et al. | — | 2023 | → |
| False positive findings during genome-wide association studies with imputation: influence of allele frequency and imputation accuracy. | Zhang Z et al. | — | 2021 | → |
| Ascertainment bias from imputation methods evaluation in wheat. | Brandariz SP et al. | — | 2016 | → |
| Fine mapping of a quantitative trait locus for bovine milk fat composition on Bos taurus autosome 19. | Bouwman AC et al. | — | 2014 | → |
| Imputation across genotyping arrays for genome-wide association studies: assessment of bias and a correction strategy. | Johnson EO et al. | — | 2013 | → |
| Evaluation of the imputation performance of the program IMPUTE in an admixed sample from Mexico City using several model designs. | Krithika S et al. | — | 2012 | → |