A new statistic to evaluate imputation reliability.
- Authors
- Lin, Peng; Hartz, Sarah M; Zhang, Zhehao; Saccone, Scott F; Wang, Jia; Tischfield, Jay A; Edenberg, Howard J; Kramer, John R; M Goate, Alison; Bierut, Laura J; Rice, John P; COGA Collaborators COGEND Collaborators, GENEVA
- Year
- 2010
- Journal
- PloS one
- PMID
- 20300623
- DOI
- 10.1371/journal.pone.0009697
- PMCID
- PMC2837741
BACKGROUND: As the amount of data from genome wide association studies grows dramatically, many interesting scientific questions require imputation to combine or expand datasets. However, there are two situations for which imputation has been problematic: (1) polymorphisms with low minor allele frequency (MAF), and (2) datasets where subjects are genotyped on different platforms. Traditional measures of imputation cannot effectively address these problems. METHODOLOGY/PRINCIPAL FINDINGS: We introduce a new statistic, the imputation quality score (IQS). In order to differentiate between well-imputed and poorly-imputed single nucleotide polymorphisms (SNPs), IQS adjusts the concordance between imputed and genotyped SNPs for chance. We first evaluated IQS in relation to minor allele frequency. Using a sample of subjects genotyped on the Illumina 1 M array, we extracted those SNPs that were also on the Illumina 550 K array and imputed them to the full set of the 1 M SNPs. As expected, the average IQS value drops dramatically with a decrease in minor allele frequency, indicating that IQS appropriately adjusts for minor allele frequency. We then evaluated whether IQS can filter poorly-imputed SNPs in situations where cases and controls are genotyped on different platforms. Randomly dividing the data into "cases" and "controls", we extracted the Illumina 550 K SNPs from the cases and imputed the remaining Illumina 1 M SNPs. The initial Q-Q plot for the test of association between cases and controls was grossly distorted (lambda = 1.15) and had 4016 false positives, reflecting imputation error. After filtering out SNPs with IQS<0.9, the Q-Q plot was acceptable and there were no longer false positives. We then evaluated the robustness of IQS computed independently on the two halves of the data. In both European Americans and African Americans the correlation was >0.99 demonstrating that a database of IQS values from common imputations could be used as an effective filter to combine data genotyped on different platforms. CONCLUSIONS/SIGNIFICANCE: IQS effectively differentiates well-imputed and poorly-imputed SNPs. It is particularly useful for SNPs with low minor allele frequency and when datasets are genotyped on different platforms.
The means of IQS and imputation accuracy within each minor allele frequency interval.IQS adjusts for chance agreement. As the minor allele frequency approaches 0, the difference between IQS and imputation accuracy increases. The standard deviation is shown for every other point.
The Q-Q plots based on randomly dividing data into cases and controls.Samples were divided randomly into cases and controls. (A) All Illumina 1 M SNPs are directly genotyped indicating there is no population stratification or other non-random factors in cases and controls. (B) Cases were genotyped on the Illumina 550 K array and the remaining Illumina 1 M SNPs were imputed. (C) An IQS filter (IQS>0.9) was applied, retaining 92% of the SNPs. (D) An imputation accuracy filter (>0.99) was applied, retaining 91% of the SNPs.
Evaluation of the robustness of IQS score.European Americans (A) and African Americans(B) datasets were split in half and Illumina 550 K SNPs were imputed to Illumina 1 M SNPs. IQS score for the two halves of the data were plotted against each other. SNPs with minor allele frequency less than 0.01 were excluded to avoid zero in the denominator.
A database of IQS can be used to filter poorly-imputed SNPs.The set of hard-to-impute SNPs compiled from one dataset can be used to filter the imputed data in another dataset. (A) Cases were European Americans genotyped on the Illumina 550 K array and the remaining Illumina 1 M SNPs were imputed. Controls were European Americans genotyped on the Illumina 1 M array. The QQ plot was shown for the 790,965 available SNPs. (B) An IQS filter (IQS>0.9) was applied, retaining 92% of the SNPs. IQS was calculated from an independent dataset. (C) A similar QQ plot for African Americans. Cases were genotyped on the Illumina 550 K array and the remaining Illumina 1 M SNPs were imputed. Controls were genotyped on the Illumina 1 M array. The QQ plot was shown for the 836,993 available SNPs. (D) An IQS filter (IQS>0.9) was applied, retaining 78% of the SNPs. IQS was calculated from an independent dataset.
| Name | Type |
|---|---|
| AA | cohort |
| AA sample | cohort |
| Affymetrix 5.0 | drug |
| Affymetrix 5.0 array local | drug |
| Affymetrix 6.0 | drug |
| Affymetrix array local | cohort |
| Affymetrix GeneChip Mapping 500 K Array Set local | drug |
| African | cohort |
| African American | cohort |
| cases | cohort |
| Center for Inherited Disease Research | cohort |
| CEPH | cohort |
| CEU | cohort |
| COGEND | cohort |
| Collaborative Study on the Genetics of Alcoholism (COGA) | cohort |
| common human diseases local | phenotype |
| controls | cohort |
| Database of IQS scores local | drug |
| EA | cohort |
| European ancestry | cohort |
| false positive rate | phenotype |
| false positive SNPs local | variant |
| Family study of cocaine dependence | cohort |
| First group local | cohort |
| genetic variants | cohort |
| GENEVA consortium | cohort |
| GENEVA project local | cohort |
| genome wide significant SNPs local | variant |
| HapMap | cohort |
| HapMap controls local | drug |
| HapMap Phase II CEU population local | cohort |
| HapMap Phase II release 22 | cohort |
| Hard-to-impute SNPs local | variant |
| Illumina 1 M local | drug |
| Illumina 1 M array local | drug |
| Illumina 550 K local | drug |
| Illumina 550 K array local | drug |
| Illumina array local | cohort |
| Illumina Human 1 M array local | drug |
| Illumina HumanHap 550 K Array set local | drug |
| imputation accuracy | drug |
| imputation accuracy filter local | drug |
| Imputation efficiency local | phenotype |
| imputation reliability local | phenotype |
| Impute2 | drug |
| Imputed SNP local | variant |
| imputed SNPs | variant |
| International Hapmap Project | cohort |
| IQS local | drug |
| IQS local | phenotype |
| IQS filter local | drug |
| Johns Hopkins University | cohort |
| minor allele frequency local | phenotype |
| National Institute of Mental Health Center for Collaborative Genetic Studies on Mental Disorders local | cohort |
| NCBI Build 36 dbSNP b126 local | cohort |
| NIMH GAIN samples local | cohort |
| other available SNPs local | variant |
| population stratification | phenotype |
| rare variant | cohort |
| SAGE | cohort |
| Second group local | cohort |
| SNP | cohort |
| SNP microarrays | drug |
| Study of Addiction: Genetics and Environment | cohort |
| Type I error local | phenotype |
| uncommon SNP local | variant |
| uncommon SNPs local | variant |
| Wellcome Trust local | cohort |
| Yoruba | cohort |
| YRI reference panel local | cohort |
No uploaded files.
No papers in this knowledge base cite this source.
External
| Title | Authors | Journal | Year | Link |
|---|---|---|---|---|
| Association of genetic variation with age at diagnosis in type 1 diabetes. | Vollenbrock CE et al. | β | 2026 | β |
| Adjustment for Genotype Imputation Uncertainty Corrects for Inflated Type I Error in Family-Based Association Testing. | Day TRC et al. | β | 2025 | β |
| A low-coverage skim-sequencing and imputation pipeline for genomic selection. | Sthapit SR et al. | β | 2025 | β |
| A primer on sequencing and genotype imputation in cattle. | Rowan TN | β | 2025 | β |
| Assessing Genotype Imputation Methods for Low-Coverage Sequencing Data in Populations With Differing Relatedness and Inbreeding Levels. | Vi T et al. | β | 2025 | β |
| Benchmarking Imputed Low Coverage Genomes in a Human Population Genetics Context. | Purnomo GA et al. | β | 2025 | β |
| Evaluation of Low-Coverage Sequencing Strategies for Whole-Genome Imputation in Pacific Abalone <i>Haliotis discus hannai</i>. | Fei C et al. | β | 2025 | β |
| Genetic regulation of TERT splicing affects cancer risk by altering cellular longevity and replicative potential. | Florez-Vargas O et al. | β | 2025 | β |
| Genotype imputation from low-coverage WGS using haplotype reference panels in cultivated strawberry. | Koorevaar T et al. | β | 2025 | β |
| Imputation disparities driven by recent selection and their impact on disease risk estimation in East and Southeast Asian populations. | Li D et al. | β | 2025 | β |
| STICI: Split-Transformer with integrated convolutions for genotype imputation. | Mowlaei ME et al. | β | 2025 | β |
| A deep learning approach to prediction of blood group antigens from genomic data. | Moslemi C et al. | β | 2024 | β |
| Genotype imputation in human genomic studies. | Berdnikova AA et al. | β | 2024 | β |
| How local reference panels improve imputation in French populations. | Herzig AF et al. | β | 2024 | β |
| Imputation accuracy across global human populations. | Cahoon JL et al. | β | 2024 | β |
| Deep Learning Methods for Omics Data Imputation. | Huang L et al. | β | 2023 | β |
| Genetic prediction of 33 blood group phenotypes using an existing genotype dataset. | Moslemi C et al. | β | 2023 | β |
| A comparative analysis of current phasing and imputation software. | De Marino A et al. | β | 2022 | β |
| A data harmonization pipeline to leverage external controls and boost power in GWAS. | Chen D et al. | β | 2022 | β |
| An autoencoder-based deep learning method for genotype imputation. | Song M et al. | β | 2022 | β |
| A Pipeline for Phasing and Genotype Imputation on Mixed Human Data (Parents-Offspring Trios and Unrelated Subjects) by Reviewing Current Methods and Software. | Baldrighi GN et al. | β | 2022 | β |
| Best practices for analyzing imputed genotypes from low-pass sequencing in dogs. | Buckley RM et al. | β | 2022 | β |
| Genotype imputation and polygenic score estimation in northwestern Russian population. | Kolosov N et al. | β | 2022 | β |
| MagicalRsq: Machine-learning-based genotype imputation quality calibration. | Sun Q et al. | β | 2022 | β |
| Assessment of Imputation Quality: Comparison of Phasing and Imputation Algorithms in Real Data. | Stahl K et al. | β | 2021 | β |
| Investigating the accuracy of imputing autosomal variants in Nellore cattle using the ARS-UCD1.2 assembly of the bovine genome. | Hermisdorff IDC et al. | β | 2020 | β |
| Quality Control Measures and Validation in Gene Association Studies: Lessons for Acute Illness. | Cohen M et al. | β | 2020 | β |
| A multi-breed reference panel and additional rare variants maximize imputation accuracy in cattle. | Rowan TN et al. | β | 2019 | β |
| Comparison and assessment of family- and population-based genotype imputation methods in large pedigrees. | Ullah E et al. | β | 2019 | β |
| Evaluation of vitamin D biosynthesis and pathway target genes reveals UGT2A1/2 and EGFR polymorphisms associated with epithelial ovarian cancer in African American Women. | Grant DJ et al. | β | 2019 | β |
| Linkage disequilibrium and effective population size in Gir cattle selected for yearling weight. | Toro Ospina AM et al. | β | 2019 | β |
| Meta-Analysis of Genome-Wide Association Studies Identifies Three Loci Associated With Stiffness Index of the Calcaneus. | Lu HF et al. | β | 2019 | β |
| Revisit Population-based and Family-based Genotype Imputation. | Liu CT et al. | β | 2019 | β |
| The African Descent and Glaucoma Evaluation Study (ADAGES) III: Contribution of Genotype to Glaucoma Phenotype in African Americans: Study Design and Baseline Data. | Zangwill LM et al. | β | 2019 | β |
| Genome-Wide Association Study of Heavy Smoking and Daily/Nondaily Smoking in the Hispanic Community Health Study/Study of Latinos (HCHS/SOL). | Saccone NL et al. | β | 2018 | β |
| Genotype imputation performance of three reference panels using African ancestry individuals. | Vergara C et al. | β | 2018 | β |
| Imputation from SNP chip to sequence: a case study in a Chinese indigenous chicken population. | Ye S et al. | β | 2018 | β |
| Failure to replicate thrombomodulin genetic variant predictors of venous thromboembolism in African Americans. | Folsom AR et al. | β | 2017 | β |
| Imputation of missing genotypes within LD-blocks relying on the basic coalescent and beyond: consideration of population growth and structure. | Kabisch M et al. | β | 2017 | β |
| Inclusion of Population-specific Reference Panel from India to the 1000 Genomes Phase 3 Panel Improves Imputation Accuracy. | Ahmad M et al. | β | 2017 | β |
| Siccuracy: An R-package for executing genotype imputation strategy simulations with AlphaImpute | Edwards SM | β | 2017 | β |
| Empirical determination of breed-of-origin of alleles in three-breed cross pigs. | Sevillano CA et al. | β | 2016 | β |
| Family-based approaches: design, imputation, analysis, and beyond. | Wijsman EM | β | 2016 | β |
| Genome-wide association study of antidepressant response: involvement of the inorganic cation transmembrane transporter activity pathway. | Cocchi E et al. | β | 2016 | β |
| Imputing rare variants in families using a two-stage approach. | Lent S et al. | β | 2016 | β |
| Accuracy of imputation using the most common sires as reference population in layer chickens. | Heidaritabar M et al. | β | 2015 | β |
| Evaluating the ovarian cancer gonadotropin hypothesis: a candidate gene study. | Lee AW et al. | β | 2015 | β |
| First genome-wide association study in an Australian aboriginal population provides insights into genetic risk factors for body mass index and type 2 diabetes. | Anderson D et al. | β | 2015 | β |
| Tailored selection of study individuals to be sequenced in order to improve the accuracy of genotype imputation. | Peil B et al. | β | 2015 | β |
| When Does Choice of Accuracy Measure Alter Imputation Accuracy Assessments? | Ramnarine S et al. | β | 2015 | β |
| Evaluation of measures of correctness of genotype imputation in the context of genomic prediction: a review of livestock applications. | Calus MP et al. | β | 2014 | β |
| Genotypic discrepancies arising from imputation. | Hinrichs AL et al. | β | 2014 | β |
| Impact of pre-imputation SNP-filtering on genotype imputation results. | Roshyara NR et al. | β | 2014 | β |
| Imputation and quality control steps for combining multiple genome-wide datasets. | Verma SS et al. | β | 2014 | β |
| Imputation in families using a heuristic phasing approach. | Blackburn AN et al. | β | 2014 | β |
| Predicting HLA genotypes using unphased and flanking single-nucleotide polymorphisms in Han Chinese population. | Hsieh AR et al. | β | 2014 | β |
| Value of Mendelian laws of segregation in families: data quality control, imputation, and beyond. | Blue EM et al. | β | 2014 | β |
| A K(ATP) channel gene effect on sleep duration: from genome-wide association studies to function in Drosophila. | Allebrandt KV et al. | β | 2013 | β |
| Dosage transmission disequilibrium test (dTDT) for linkage and association detection. | Zhang Z et al. | β | 2013 | β |
| Genotype imputation accuracy in a F2 pig population using high density and low density SNP panels. | GualdrΓ³n Duarte JL et al. | β | 2013 | β |
| Imputation-based genomic coverage assessments of current human genotyping arrays. | Nelson SC et al. | β | 2013 | β |
| MaCH-admix: genotype imputation for admixed populations. | Liu EY et al. | β | 2013 | β |
| Meta-analysis methods for genome-wide association studies and beyond. | Evangelou E et al. | β | 2013 | β |
| Assessment of genotype imputation performance using 1000 Genomes in African American studies. | Hancock DB et al. | β | 2012 | β |
| A Ξ½-support vector regression based approach for predicting imputation quality. | Huang YH et al. | β | 2012 | β |
| Genotype imputation of Metabochip SNPs using a study-specific reference panel of ~4,000 haplotypes in African Americans from the Women's Health Initiative. | Liu EY et al. | β | 2012 | β |
| Imputation of genotypes with low-density chips and its effect on reliability of direct genomic values in Dutch Holstein cattle. | Mulder HA et al. | β | 2012 | β |
| Copy number variation accuracy in genome-wide association studies. | Lin P et al. | β | 2011 | β |
| Rare variant association analysis methods for complex traits. | Asimit J et al. | β | 2010 | β |