An empirical evaluation of imputation accuracy for association statistics reveals increased type-I error rates in genome-wide associations.

paper Cited Public

Authors: Almeida, Marcio A A; Oliveira, Paulo S L; Pereira, Tiago V; Krieger, José E; Pereira, Alexandre C
Year: 2011
Journal: BMC genetics
PMID: 21251252
DOI: 10.1186/1471-2156-12-10
PMCID: PMC3224203

BACKGROUND: Genome wide association studies (GWAS) are becoming the approach of choice to identify genetic determinants of complex phenotypes and common diseases. The astonishing amount of generated data and the use of distinct genotyping platforms with variable genomic coverage are still analytical challenges. Imputation algorithms combine directly genotyped markers information with haplotypic structure for the population of interest for the inference of a badly genotyped or missing marker and are considered a near zero cost approach to allow the comparison and combination of data generated in different studies. Several reports stated that imputed markers have an overall acceptable accuracy but no published report has performed a pair wise comparison of imputed and empiric association statistics of a complete set of GWAS markers. RESULTS: In this report we identified a total of 73 imputed markers that yielded a nominally statistically significant association at P < 10 -5 for type 2 Diabetes Mellitus and compared them with results obtained based on empirical allelic frequencies. Interestingly, despite their overall high correlation, association statistics based on imputed frequencies were discordant in 35 of the 73 (47%) associated markers, considerably inflating the type I error rate of imputed markers. We comprehensively tested several quality thresholds, the haplotypic structure underlying imputed markers and the use of flanking markers as predictors of inaccurate association statistics derived from imputed markers. CONCLUSIONS: Our results suggest that association statistics from imputed markers showing specific MAF (Minor Allele Frequencies) range, located in weak linkage disequilibrium blocks or strongly deviating from local patterns of association are prone to have inflated false positive association signals. The present study highlights the potential of imputation procedures and proposes simple procedures for selecting the best imputed markers for follow-up genotyping studies.

Figure 1

Efficiency of filtering criteria. Scatterplot comparing the minus-log corrected empiric and imputed p.values of the markers present in the complete dataset (A) and in the filtered one (B).

Figure 2

Summary plots. In Panel A: a graphical representation of the distribution of empiric association statistics throughout the human genome. In Panel B: same as A using the association statistics derived from imputed frequencies. In panel C: The distribution of the observed bias of association statistics of empiric and imputed frequencies.

Figure 3

Comparison of the predictive value of commonly used quality criteria for the observed bias between empiric and imputed allelic frequencies. The minus log bias is plotted in the y axis and the tested variables in x axis.

Figure 4

Comparison of different r2 summary statistics of the complete set of haplotypic blocks and their use as predictive variables for the observed bias between empiric and imputed frequencies.

Figure 5

Boxplot representation of the comparison of summary statistics of association values comparing sliding windows centered in concordant and discordant imputed markers.

Figure 6

Local patterns of association as predictor of accurate imputation. On the lower graphic, a graphical representation highlights markers that could be considered associated to the phenotype under study using a significance threshold of 10 -5. On the higher left and right panel, a highlighted representation of regions with concordant (right) and discordant (left) associations.

#	Section	Preview
0	Background	Genome-wide association studies (GWAS) are a promising tool for the identification of genetic…
1	Background	ranging from hundreds of thousands to millions of typed markers [5]. This diversity in panels of…
2	Background	To overcome these issues, genotyping imputation algorithms were developed. These methods use…
3	Background	Here, we present a comprehensive comparative analysis of the data generated by the multipoint…
4	Background	distinct cohorts selected to avoid population stratification, a very common source of bias in GWAS.…
5	Results — Characteristics of the examined datasets	The results discussed herein are based on data available for approximately 2000 individuals accessed…
6	Results — SNP selection quality criteria	Association studies using empiric or imputed frequencies are very sensitive to low quality markers.…
7	Results — SNP selection quality criteria	group of 66.000 (17%) markers showing a significant difference in the magnitude of the association…
8	Results — Imputed versus empirically genotyped markers: inflation of type-I error rates	Using the filtered dataset, the examined imputation algorithm, as previously described for allele…
9	Results — Imputed versus empirically genotyped markers: inflation of type-I error rates	To further analyse the nature of such type-I error inflation, we describe markers for whom their…
10	Results — Imputed versus empirically genotyped markers: inflation of type-I error rates	To further explore the relationship between association statistics derived from imputed and from…
11	Results — Imputed versus empirically genotyped markers: inflation of type-I error rates	The same analytic procedure was carried in polymorphic markers presented in the WTCCC hypertension…
12	Results — Characteristics of the false-positive signals	Next, we sought to examine characteristics of false-positive associations that could be used as…
13	Results — Characteristics of the false-positive signals	It is accepted that some chromosomal regions, due to a higher number of recombination events, have…
14	Results — Characteristics of the false-positive signals	when imputation methods were applied. Specifically, polymorphisms located at chromosomes 1, 3 and 15…
15	Results — Key indicators of a poor imputation performance on association statistics	Next, we carried out exploratory procedures to investigate key indicators of a poor imputation…
16	Results — Key indicators of a poor imputation performance on association statistics	To determine if the bias between association statistics could be predicted by common filtering…
17	Results — Key indicators of a poor imputation performance on association statistics	different thresholds of calling probabilities were not efficient predictors any further. When the…
18	Results — Key indicators of a poor imputation performance on association statistics	of markers, especially the ones showing MAF very close to 0,5, have an increased odds of being…
19	Results — Key indicators of a poor imputation performance on association statistics	The use of inconclusive or incomplete haplotypic information has long been considered a major source…

Citation	PMID	DOI	Status
AndersonCAEvaluating the effects of imputation on the power, coverage, and cost efficiency of genome-wide SNP platformsAm J Hum Genet2008831112910.1016/j.ajhg.2008.06.00818589396PMC2443836	—	—	—
BaldingDJA tutorial on statistical methods for population association studiesNat Rev Genet20067107819110.1038/nrg191616983374	—	—	—
BarrettJCCardonLREvaluating coverage of genome-wide association studiesNat Genet20063866596210.1038/ng180116715099	—	—	—
de BakkerPIPractical aspects of imputation-driven meta-analysis of genome-wide association studiesHum Mol Genet200817R2R122810.1093/hmg/ddn28818852200PMC2782358	—	—	—
Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controlsNature200744771456617810.1038/nature0591117554300PMC2719288	—	—	—
GonzalezJRSNPassoc: an R package to perform whole genome association studiesBioinformatics2007235644510.1093/bioinformatics/btm02517267436	—	—	—
MarchiniJA new multipoint method for genome-wide association studies by imputation of genotypesNat Genet20073979061310.1038/ng208817572673	—	—	—
Newton-ChehCGenome-wide association study identifies eight loci associated with blood pressureNat Genet200910.1038/ng.361PMC289167319430483	—	—	—
NothnagelMA comprehensive evaluation of SNP genotype imputationHum Genet200912521637110.1007/s00439-008-0606-519089453	—	—	—
PeiYFAnalyses and comparison of accuracy of different genotype imputation methodsPLoS One2008310e355110.1371/journal.pone.000355118958166PMC2569208	—	—	—
ServinBStephensMImputation-based analysis of association studies: candidate regions and quantitative traitsPLoS Genet200737e11410.1371/journal.pgen.003011417676998PMC1934390	—	—	—
WolfsMGType 2 Diabetes Mellitus: New Genetic Insights will Lead to New TherapeuticsCurr Genomics2009102110810.2174/13892020978784702319794883PMC2699827	—	—	—
YuZSchaidDJMethods to impute missing genotypes for population dataHum Genet2007122549550410.1007/s00439-007-0427-y17851696	—	—	—
ZhaoZImputation of missing genotypes: an empirical evaluation of IMPUTEBMC Genet200898510.1186/1471-2156-9-8519077279PMC2636842	—	—	—

Title	Authors	Journal	Year	Link
Comparing Methods to Select Candidates for Re-Genotyping to Impute Higher-Density Genotype Data in a Japanese Black Cattle Population: A Case Study.	Ogawa S et al.	—	2023	→
False positive findings during genome-wide association studies with imputation: influence of allele frequency and imputation accuracy.	Zhang Z et al.	—	2021	→
Ascertainment bias from imputation methods evaluation in wheat.	Brandariz SP et al.	—	2016	→
Fine mapping of a quantitative trait locus for bovine milk fat composition on Bos taurus autosome 19.	Bouwman AC et al.	—	2014	→
Imputation across genotyping arrays for genome-wide association studies: assessment of bias and a correction strategy.	Johnson EO et al.	—	2013	→
Evaluation of the imputation performance of the program IMPUTE in an admixed sample from Mexico City using several model designs.	Krithika S et al.	—	2012	→

An empirical evaluation of imputation accuracy for association statistics reveals increased type-I error rates in genome-wide associations.

In this knowledge base

External