Assessment of genotype imputation performance using 1000 Genomes in African American studies.
- Authors
- Hancock, Dana B; Levy, Joshua L; Gaddis, Nathan C; Bierut, Laura J; Saccone, Nancy L; Page, Grier P; Johnson, Eric O
- Year
- 2012
- Journal
- PloS one
- PMID
- 23226329
- DOI
- 10.1371/journal.pone.0050610
- PMCID
- PMC3511547
Genotype imputation, used in genome-wide association studies to expand coverage of single nucleotide polymorphisms (SNPs), has performed poorly in African Americans compared to less admixed populations. Overall, imputation has typically relied on HapMap reference haplotype panels from Africans (YRI), European Americans (CEU), and Asians (CHB/JPT). The 1000 Genomes project offers a wider range of reference populations, such as African Americans (ASW), but their imputation performance has had limited evaluation. Using 595 African Americans genotyped on Illumina's HumanHap550v3 BeadChip, we compared imputation results from four software programs (IMPUTE2, BEAGLE, MaCH, and MaCH-Admix) and three reference panels consisting of different combinations of 1000 Genomes populations (February 2012 release): (1) 3 specifically selected populations (YRI, CEU, and ASW); (2) 8 populations of diverse African (AFR) or European (AFR) descent; and (3) all 14 available populations (ALL). Based on chromosome 22, we calculated three performance metrics: (1) concordance (percentage of masked genotyped SNPs with imputed and true genotype agreement); (2) imputation quality score (IQS; concordance adjusted for chance agreement, which is particularly informative for low minor allele frequency [MAF] SNPs); and (3) average r2hat (estimated correlation between the imputed and true genotypes, for all imputed SNPs). Across the reference panels, IMPUTE2 and MaCH had the highest concordance (91%-93%), but IMPUTE2 had the highest IQS (81%-83%) and average r2hat (0.68 using YRI+ASW+CEU, 0.62 using AFR+EUR, and 0.55 using ALL). Imputation quality for most programs was reduced by the addition of more distantly related reference populations, due entirely to the introduction of low frequency SNPs (MAFβ€2%) that are monomorphic in the more closely related panels. While imputation was optimized by using IMPUTE2 with reference to the ALL panel (average r2hatβ=β0.86 for SNPs with MAF>2%), use of the ALL panel for African American studies requires careful interpretation of the population specificity and imputation quality of low frequency SNPs.
Concordance resulting from four different imputation programs and three different 1000 Genomes (February 2012 release) reference panels.Concordance rates were based on masking 2% of the genotyped SNPs on chromosome 22 and comparing imputed and true genotypes. The number of subjects corresponding to each reference panel is shown in parentheses.
Imputation quality score (IQS) resulting from four different imputation programs and three different 1000 Genomes (February 2012) reference panels.IQS results were based on masking 2% of the genotyped SNPs and adjusting the concordance rate chance agreement between imputed and true genotypes. The number of subjects corresponding to each reference panel is shown in parentheses.
Average r2hat values resulting from four different imputation programs and three different 1000 Genomes (February 2012) reference panels.r2hat values were averaged across all imputed SNPs on chromosome 22. The number of subjects corresponding to each reference panel is shown in parentheses.
Average r2hat, based on imputation using IMPUTE2, across the minor allele frequency (MAF) spectrum.Imputation was conducted for all SNPs available on the YRI+CEU+ASW (N = 234, in red), AFR+EUR (N = 625, in green), or the ALL (N = 1,092, in blue) reference panel from 1000 Genomes. Imputed polymorphic SNPs were divided into MAF intervals of 1%, and their average r2hat values were calculated within each interval.
No entities extracted from this document yet.
No uploaded files.
| Citation | PMID | DOI | Status |
|---|---|---|---|
| AltshulerDM, GibbsRA, PeltonenL, DermitzakisE, SchaffnerSF, et al (2010) Integrating common and rare genetic variation in diverse human populations. Nature 467: 52β58.2081145110.1038/nature09298PMC3173859 | β | β | β |
| BeechamGW, MartinER, GilbertJR, HainesJL, Pericak-VanceMA (2010) APOE is not associated with Alzheimer disease: a cautionary tale of genotype imputation. Ann Hum Genet 74: 189β194.2052901310.1111/j.1469-1809.2010.00573.xPMC2934779 | β | β | β |
| BrowningBL, BrowningSR (2009) A unified approach to genotype imputation and haplotype-phase inference for large data sets of trios and unrelated individuals. Am J Hum Genet 84: 210β223.1920052810.1016/j.ajhg.2009.01.005PMC2668004 | β | β | β |
| ChandaP, YuhkiN, LiM, BaderJS, HartzA, et al (2012) Comprehensive evaluation of imputation performance in African Americans. J Hum Genet 57: 411β421.2264818610.1038/jhg.2012.43PMC3477509 | β | β | β |
| CharlesBA, ShrinerD, DoumateyA, ChenG, ZhouJ, et al (2011) A genome-wide association study of serum uric acid in African Americans. BMC Med Genomics 4: 17.2129490010.1186/1755-8794-4-17PMC3045279 | β | β | β |
| CohenJ (1960) A coefficient of agreement for nominal scales. Educ Psychol Meas 20: 37β46. | β | β | β |
| de BakkerPI, FerreiraMA, JiaX, NealeBM, RaychaudhuriS, et al (2008) Practical aspects of imputation-driven meta-analysis of genome-wide association studies. Hum Mol Genet 17: R122β128.1885220010.1093/hmg/ddn288PMC2782358 | β | β | β |
| DurbinRM, AbecasisGR, AltshulerDL, AutonA, BrooksLD, et al (2010) A map of human genome variation from population-scale sequencing. Nature 467: 1061β1073.2098109210.1038/nature09534PMC3042601 | β | β | β |
| EgyudMR, GajdosZK, ButlerJL, TischfieldS, Le MarchandL, et al (2009) Use of weighted reference panels based on empirical estimates of ancestry for capturing untyped variation. Hum Genet 125: 295β303.1918411110.1007/s00439-009-0627-8PMC3126674 | β | β | β |
| FrazerKA, BallingerDG, CoxDR, HindsDA, StuveLL, et al (2007) A second generation human haplotype map of over 3.1 million SNPs. Nature 449: 851β861.1794312210.1038/nature06258PMC2689609 | β | β | β |
| GenoveseG, TonnaSJ, KnobAU, AppelGB, KatzA, et al (2010) A risk allele for focal segmental glomerulosclerosis in African Americans is located within a region containing APOL1 and MYH9. Kidney Int 78: 698β704.2066843010.1038/ki.2010.251PMC3001190 | β | β | β |
| Hancock DB, Levy JL, Page GP, Johnson EO (2011) Genotype imputation in African Americans: an evaluation for selecting an optimal reference panel. 12th International Congress of Human Genetics/61st Annual Meeting of The American Society of Human Genetics. Montreal, Canada. | β | β | β |
| HaoK, ChudinE, McElweeJ, SchadtEE (2009) Accuracy of genome-wide imputation of untyped markers and impacts on statistical power for association studies. BMC Genet 10: 27.1953125810.1186/1471-2156-10-27PMC2709633 | β | β | β |
| HowieB, MarchiniJ, StephensM (2011) Genotype imputation with thousands of genomes. G3 (Bethesda) 1: 457β470.2238435610.1534/g3.111.001198PMC3276165 | β | β | β |
| HuangL, JakobssonM, PembertonTJ, IbrahimM, NyamboT, et al (2011) Haplotype variation and genotype imputation in African populations. Genet Epidemiol 35: 766β780.2212522010.1002/gepi.20626PMC3568705 | β | β | β |
| HuangL, LiY, SingletonAB, HardyJA, AbecasisG, et al (2009) Genotype-imputation accuracy across worldwide human populations. Am J Hum Genet 84: 235β250.1921573010.1016/j.ajhg.2009.01.013PMC2668016 | β | β | β |
| HuangL, WangC, RosenbergNA (2009) The relationship between imputation error and statistical power in genetic association studies in diverse populations. Am J Hum Genet 85: 692β698.1985324110.1016/j.ajhg.2009.09.017PMC2775841 | β | β | β |
| JallowM, TeoYY, SmallKS, RockettKA, DeloukasP, et al (2009) Genome-wide and fine-resolution association analysis of malaria in West Africa. Nat Genet 41: 657β665.1946590910.1038/ng.388PMC2889040 | β | β | β |
| JostinsL, MorleyKI, BarrettJC (2011) Imputation of low-frequency variants using the HapMap3 benefits from large, diverse reference sets. Eur J Hum Genet 19: 662β666.2136469710.1038/ejhg.2011.10PMC3110048 | β | β | β |
| KangSJ, ChiangCW, PalmerCD, TayoBO, LettreG, et al (2010) Genome-wide association of anthropometric traits in African- and African-derived populations. Hum Mol Genet 19: 2725β2738.2040045810.1093/hmg/ddq154PMC2883343 | β | β | β |
| LettreG, PalmerCD, YoungT, EjebeKG, AllayeeH, et al (2011) Genome-wide association study of coronary heart disease and its risk factors in 8,090 African Americans: the NHLBI CARe Project. PLoS Genet 7: e1001300.2134728210.1371/journal.pgen.1001300PMC3037413 | β | β | β |
| LinP, HartzSM, ZhangZ, SacconeSF, WangJ, et al (2010) A new statistic to evaluate imputation reliability. PLoS One 5: e9697.2030062310.1371/journal.pone.0009697PMC2837741 | β | β | β |
| LiY, WillerCJ, DingJ, ScheetP, AbecasisGR (2010) MaCH: using sequence and genotype data to estimate haplotypes and unobserved genotypes. Genet Epidemiol 34: 816β834.2105833410.1002/gepi.20533PMC3175618 | β | β | β |
| ManichaikulA, MychaleckyjJC, RichSS, DalyK, SaleM, et al (2010) Robust relationship inference in genome-wide association studies. Bioinformatics 26: 2867β2873.2092642410.1093/bioinformatics/btq559PMC3025716 | β | β | β |
| MarchiniJ, HowieB (2010) Genotype imputation for genome-wide association studies. Nat Rev Genet 11: 499β511.2051734210.1038/nrg2796 | β | β | β |
| Mathias RA, Grant AV, Rafaels N, Hand T, Gao L, et al.. (2010) A genome-wide association study on African-ancestry populations for asthma. J Allergy Clin Immunol 125: 336β346 e334.10.1016/j.jaci.2009.08.031PMC360601519910028 | β | β | β |
| NhoK, ShenL, KimS, SwaminathanS, RisacherSL, et al (2011) The effect of reference panels and software tools on genotype imputation. AMIA Annu Symp Proc 2011: 1013β1018.22195161PMC3243280 | β | β | β |
| NothnagelM, EllinghausD, SchreiberS, KrawczakM, FrankeA (2009) A comprehensive evaluation of SNP genotype imputation. Hum Genet 125: 163β171.1908945310.1007/s00439-008-0606-5 | β | β | β |
| PasaniucB, ZaitlenN, LettreG, ChenGK, TandonA, et al (2011) Enhanced statistical tests for GWAS in admixed populations: assessment using African Americans from CARe and a Breast Cancer Consortium. PLoS Genet 7: e1001371.2154101210.1371/journal.pgen.1001371PMC3080860 | β | β | β |
| PeiYF, LiJ, ZhangL, PapasianCJ, DengHW (2008) Analyses and comparison of accuracy of different genotype imputation methods. PLoS One 3: e3551.1895816610.1371/journal.pone.0003551PMC2569208 | β | β | β |
| PembertonTJ, JakobssonM, ConradDF, CoopG, WallJD, et al (2008) Using population mixtures to optimize the utility of genomic databases: linkage disequilibrium and association study design in India. Ann Hum Genet 72: 535β546.1851327910.1111/j.1469-1809.2008.00457.xPMC2495051 | β | β | β |
| PritchardJK, StephensM, DonnellyP (2000) Inference of population structure using multilocus genotype data. Genetics 155: 945β959.1083541210.1093/genetics/155.2.945PMC1461096 | β | β | β |
| PurcellS, NealeB, Todd-BrownK, ThomasL, FerreiraMA, et al (2007) PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet 81: 559β575.1770190110.1086/519795PMC1950838 | β | β | β |
| ShrinerD, AdeyemoA, ChenG, RotimiCN (2010) Practical considerations for imputation of untyped markers in admixed populations. Genet Epidemiol 34: 258β265.1991875710.1002/gepi.20457PMC2912698 | β | β | β |
| SouthamL, PanoutsopoulouK, RaynerNW, ChapmanK, DurrantC, et al (2011) The effect of genome-wide association scan quality control on imputation outcome for common variants. Eur J Hum Genet 19: 610β614.2126700810.1038/ejhg.2010.242PMC3083623 | β | β | β |
| SungYJ, GuCC, TiwariHK, ArnettDK, BroeckelU, et al (2012) Genotype imputation for African Americans using data from HapMap phase II versus 1000 Genomes projects. Genet Epidemiol 36: 508β516.2264474610.1002/gepi.21647PMC3703942 | β | β | β |
| SungYJ, WangL, RankinenT, BouchardC, RaoDC (2012) Performance of genotype imputations using data from the 1000 Genomes Project. Hum Hered 73: 18β25.2221229610.1159/000334084PMC3322630 | β | β | β |
| TorgersonDG, AmplefordEJ, ChiuGY, GaudermanWJ, GignouxCR, et al (2011) Meta-analysis of genome-wide association studies of asthma in ethnically diverse North American populations. Nat Genet 43: 887β892.2180454910.1038/ng.888PMC3445408 | β | β | β |
| ZhengJ, LiY, AbecasisGR, ScheetP (2011) A comparison of approaches to account for uncertainty in analysis of imputed genotypes. Genet Epidemiol 35: 102β110.2125421710.1002/gepi.20552PMC3143715 | β | β | β |
In this knowledge base
External
| Title | Authors | Journal | Year | Link |
|---|---|---|---|---|
| A reanalysis of a genome-wide association study on breast cancer in Asian populations using the SG10K_Health reference panel for imputation: a multi-Centre case-control analysis. | Chang X et al. | β | 2026 | β |
| Data Harmonization Guidelines to Combine Multi-platform Genomic Data from Admixed Populations and Boost Power in Genome-Wide Association Studies. | Croock D et al. | β | 2024 | β |
| A data harmonization pipeline to leverage external controls and boost power in GWAS. | Chen D et al. | β | 2022 | β |
| Recovering High-Quality Host Genomes from Gut Metagenomic Data through Genotype Imputation. | Marcos S et al. | β | 2022 | β |
| Genetic Risk Stratification: A Paradigm Shift in Prevention of Coronary Artery Disease. | Roberts R et al. | β | 2021 | β |
| A large-scale genome-wide association study meta-analysis of cannabis use disorder. | Johnson EC et al. | β | 2020 | β |
| Genome wide association analysis on semen volume and milk yield using different strategies of imputation to whole genome sequence in French dairy goats | Talouarn E et al. | β | 2020 | β |
| Genome wide association analysis on semen volume and milk yield using different strategies of imputation to whole genome sequence in French dairy goats. | Talouarn E et al. | β | 2020 | β |
| The Global Durum Wheat Panel (GDP): An International Platform to Identify and Exchange Beneficial Alleles. | Mazzucotelli E et al. | β | 2020 | β |
| A multi-breed reference panel and additional rare variants maximize imputation accuracy in cattle. | Rowan TN et al. | β | 2019 | β |
| Evaluating the Accuracy of Imputation Methods in a Five-Way Admixed Population. | Schurz H et al. | β | 2019 | β |
| Protocols, Methods, and Tools for Genome-Wide Association Studies (GWAS) of Dental Traits. | Agler CS et al. | β | 2019 | β |
| Rare Variants Imputation in Admixed Populations: Comparison Across Reference Panels and Bioinformatics Tools. | Sariya S et al. | β | 2019 | β |
| The Genomic Impact of European Colonization of the Americas. | Ongaro L et al. | β | 2019 | β |
| Genome-Wide Association Studies of Cancer in Diverse Populations. | Park SL et al. | β | 2018 | β |
| Genotype imputation performance of three reference panels using African ancestry individuals. | Vergara C et al. | β | 2018 | β |
| Transancestral GWAS of alcohol dependence reveals common genetic underpinnings with psychiatric disorders. | Walters RK et al. | β | 2018 | β |
| Whole Exome Sequencing of Patients from Multicase Families with Systemic Lupus Erythematosus Identifies Multiple Rare Variants. | Delgado-Vega AM et al. | β | 2018 | β |
| Inclusion of Population-specific Reference Panel from India to the 1000 Genomes Phase 3 Panel Improves Imputation Accuracy. | Ahmad M et al. | β | 2017 | β |
| Genome-Wide Association Study in an Amerindian Ancestry Population Reveals Novel Systemic Lupus Erythematosus Risk Loci and the Role of European Admixture. | AlarcΓ³n-Riquelme ME et al. | β | 2016 | β |
| Leveraging electronic health records to study pleiotropic effects on bipolar disorder and medical comorbidities. | Prieto ML et al. | β | 2016 | β |
| A multiancestry study identifies novel genetic associations with CHRNA5 methylation in human brain and risk of nicotine dependence. | Hancock DB et al. | β | 2015 | β |
| Associations of common variants in the BST2 region with HIV-1 acquisition in African American and European American people who inject drugs. | Hancock DB et al. | β | 2015 | β |
| Cis-Expression Quantitative Trait Loci Mapping Reveals Replicable Associations with Heroin Addiction in OPRM1. | Hancock DB et al. | β | 2015 | β |
| Comparison among three variant callers and assessment of the accuracy of imputation from SNP array data to whole-genome sequence level in chicken. | Ni G et al. | β | 2015 | β |
| Integrated genomic analyses in bronchopulmonary dysplasia. | Ambalavanan N et al. | β | 2015 | β |
| Molgenis-impute: imputation pipeline in a box. | Kanterakis A et al. | β | 2015 | β |
| Systematic assessment of imputation performance using the 1000 Genomes reference panels. | Liu Q et al. | β | 2015 | β |
| When Does Choice of Accuracy Measure Alter Imputation Accuracy Assessments? | Ramnarine S et al. | β | 2015 | β |
| Effective filtering strategies to improve data quality from population-based whole exome sequencing studies. | Carson AR et al. | β | 2014 | β |
| Imputation and quality control steps for combining multiple genome-wide datasets. | Verma SS et al. | β | 2014 | β |
| Local and global ancestry inference and applications to genetic association analysis for admixed populations. | Thornton TA et al. | β | 2014 | β |
| On the performance of multiple imputation based on chained equations in tackling missing data of the African Ξ±3.7 -globin deletion in a malaria association study. | SepΓΊlveda N et al. | β | 2014 | β |
| Imputation across genotyping arrays for genome-wide association studies: assessment of bias and a correction strategy. | Johnson EO et al. | β | 2013 | β |
| Next generation sequencing and rare genetic variants: from human population studies to medical genetics. | Matullo G et al. | β | 2013 | β |
| Variants in the ATP-binding cassette transporter (ABCA7), apolipoprotein E Ο΅4,and the risk of late-onset Alzheimer disease in African Americans. | Reitz C et al. | β | 2013 | β |