A comparison of approaches to account for uncertainty in analysis of imputed genotypes.
- Authors
- Zheng, Jin; Li, Yun; Abecasis, GonΓ§alo R; Scheet, Paul
- Year
- 2011
- Journal
- Genetic epidemiology
- PMID
- 21254217
- DOI
- 10.1002/gepi.20552
- PMCID
- PMC3143715
The availability of extensively genotyped reference samples, such as "The HapMap" and 1,000 Genomes Project reference panels, together with advances in statistical methodology, have allowed for the imputation of genotypes at single nucleotide polymorphism (SNP) markers that are untyped in a cohort or case-control study. These imputation procedures facilitate the interpretation and meta-analyses of genome-wide association studies. A natural question when implementing these procedures concerns how best to take into account uncertainty in imputed genotypes. Here we compare the performance of the following three strategies: least-squares regression on the "best-guess" imputed genotype; regression on the expected genotype score or "dosage"; and mixture regression models that more fully incorporate posterior probabilities of genotypes at untyped SNPs. Using simulation, we considered a range of sample sizes, minor allele frequencies, and imputation accuracies to compare the performance of the different methods under various genetic models. The mixture models performed the best in the setting of a large genetic effect and low imputation accuracies. However, for most realistic settings, we find that regressing the phenotype on the estimated allelic or genotypic dosage provides an attractive compromise between accuracy and computational tractability.
Example of posterior probability summaries. Here we present a didactic illustration of the three summaries of the full posterior probabilities for imputed genotypes. From the set of Reference Haplotypes, the missing genotype (denoted with two ? symbols) in the sample genotypes can be inferred. Based on the reference, the first sample haplotype would consist of a C at the missing position, since all three similar haplotypes in the reference set have a C here. For the second sample haplotype, three-fourths of the similar haplotypes in the reference set consist of a C; and one consists of a T at that position. Therefore, the βexpectedβ dosage would be 1.75. And the only βpossibleβ genotypes, based completely on the reference, would be C/C and C/T, expected probabilities given.
Power vs. accuracy and allele frequency for large sample size and small effects. For each summary and the true genotypes, both an additive (solid line) and dominant (dotted line) model were analyzed. (A) and (C) are based on data simulated with an additive effect; (B) and (D) are based on data simulated under a model of complete dominance. Power was computed at a fixed type-I error rate (Ξ±) of 5 Γ 10β5. The sample size was 1,000. TOP: Power is plotted against R2, a measure of imputation accuracy. BOTTOM: Power is plotted against allele frequency. (A) Power vs. R2 with an additive effect; (B) power vs. R2 under complete dominance; (C) power vs. frequency of minor allele with an additive effect; and (D) power vs. frequency of dominant allele under complete dominance.
Power vs. accuracy and allele frequency for small sample size and large effects. Power was computed at a fixed type-I error rate (Ξ±) of 5 Γ 10β5. The sample size was 50. For each summary and the true genotypes, both an additive (solid line) and dominant (dotted line) model were analyzed. (A) and (C) are based on data simulated with an additive effect; (B) and (D) are based on data simulated under a model of complete dominance. TOP: Power is plotted against R2, a measure of imputation accuracy. BOTTOM: Power is plotted against allele frequency: (A) power vs. R2 with an additive effect; (B) power vs. R2 under complete dominance; (C) power vs. frequency of minor allele with an additive effect; and (D) power vs. frequency of dominant allele under complete dominance.
| Name | Type |
|---|---|
| 1000 Genomes Project | cohort |
| 50 individuals subset local | cohort |
| 521 markers local | variant |
| 538 individuals local | cohort |
| additive genetic model local | phenotype |
| Beagle | drug |
| BIMBAM local | drug |
| CEU | cohort |
| Cohort_1000 local | cohort |
| Cohort_50 local | cohort |
| common variants | cohort |
| dominant allele local | variant |
| dominant genetic model local | phenotype |
| eQTL mapping study local | phenotype |
| European population | cohort |
| Figure 2D local | drug |
| Figure 3D local | drug |
| Full cohort of 1,000 individuals local | cohort |
| full sample | cohort |
| FUSION | cohort |
| genetic dosage local | drug |
| genetic variants | cohort |
| genome-wide association studies | cohort |
| genotype local | other |
| GWA study | cohort |
| HapMap | cohort |
| HapMap CEU | cohort |
| homozygote local | other |
| Illumina 317K microarray local | drug |
| Illumina HumanHap300 Beadchip | drug |
| imputation | drug |
| imputation accuracy | drug |
| imputation algorithms | drug |
| Impute2 | drug |
| imputed genotype local | variant |
| Imputed genotype local | variant |
| Imputed genotypes local | variant |
| International Hapmap Project | cohort |
| large sample of 1,000 individuals local | cohort |
| LDLR | gene |
| MaCH | drug |
| marker densities local | phenotype |
| minor allele frequency local | drug |
| mixture model local | drug |
| Phase II HapMap International HapMap Consortium local | cohort |
| phenotype | phenotype |
| quantitative trait data local | phenotype |
| reference haplotypes local | cohort |
| reference panel | cohort |
| reference populations local | cohort |
| regression model local | drug |
| rs6511720 local | variant |
| sample size 50 local | cohort |
| simulated cohort local | cohort |
| simulated phenotype | phenotype |
| simulated phenotypes local | phenotype |
| single nucleotide polymorphism | variant |
| SNP | cohort |
| study cohort | cohort |
| tagSNP | variant |
| tag SNPs | cohort |
| trait | phenotype |
| True genotype local | variant |
| type 2 diabetes | phenotype |
| Wellcome Trust case control consortium | cohort |
| YRI | cohort |
No uploaded files.
In this knowledge base
| Title | Year | PMID |
|---|---|---|
| Imputation across genotyping arrays for genome-wide association studies: assessment of bias and a correction strategy. | 2013 | 23334152 |
External
| Title | Authors | Journal | Year | Link |
|---|---|---|---|---|
| A low-coverage skim-sequencing and imputation pipeline for genomic selection. | Sthapit SR et al. | β | 2025 | β |
| The genetic architecture of biological age in nine human organ systems. | Wen J et al. | β | 2024 | β |
| Variable parallelism in the genomic basis of age at maturity across spatial scales in Atlantic Salmon. | Kess T et al. | β | 2024 | β |
| Comparison of multiple imputation and other methods for the analysis of imputed genotypes. | Auer PL et al. | β | 2023 | β |
| Polygenic risk scores for prediction of cancer-associated venous thromboembolism in the UK Biobank cohort study. | Guman NAM et al. | β | 2023 | β |
| Polymorphic short tandem repeats make widespread contributions to blood and serum traits. | Margoliash J et al. | β | 2023 | β |
| Probing the diabetes and colorectal cancer relationship using gene - environment interaction analyses. | Dimou N et al. | β | 2023 | β |
| Beyond GWAS of Colorectal Cancer: Evidence of Interaction with Alcohol Consumption and Putative Causal Variant for the 10q24.2 Region. | Jordahl KM et al. | β | 2022 | β |
| Considering hormone-sensitive cancers as a single disease in the UK biobank reveals shared aetiology. | Ahmed M et al. | β | 2022 | β |
| Efficient approaches for large-scale GWAS with genotype uncertainty. | JΓΈrsboe E et al. | β | 2022 | β |
| Iam hiQ-a novel pair of accuracy indices for imputed genotypes. | Rosenberger A et al. | β | 2022 | β |
| Transethnic analysis of psoriasis susceptibility in South Asians and Europeans enhances fine-mapping in the MHC and genomewide. | Stuart PE et al. | β | 2022 | β |
| A novel nonlinear dimension reduction approach to infer population structure for low-coverage sequencing data. | Zhang M et al. | β | 2021 | β |
| Impact of pre- and post-variant filtration strategies on imputation. | Charon C et al. | β | 2021 | β |
| Eagle: multi-locus association mapping on a genome-wide scale made routine. | George AW et al. | β | 2020 | β |
| Identification of 31 loci for mammographic density phenotypes and their associations with breast cancer risk. | Sieh W et al. | β | 2020 | β |
| The impact of adjusting for baseline in pharmacogenomic genome-wide association studies of quantitative change. | Oni-Orisan A et al. | β | 2020 | β |
| Revisit Population-based and Family-based Genotype Imputation. | Liu CT et al. | β | 2019 | β |
| A Large Multiethnic Genome-Wide Association Study of Adult Body Mass Index Identifies Novel Loci. | Hoffmann TJ et al. | β | 2018 | β |
| Genetic Variants Associated with Circulating Fibroblast Growth Factor 23. | Robinson-Cohen C et al. | β | 2018 | β |
| Genome-wide analysis of polymorphismβΓβsodium interaction effect on blood pressure identifies a novel 3'-BCL11B gene desert locus. | Hachiya T et al. | β | 2018 | β |
| A Powerful Gene-Based Test Accommodating Common and Low-Frequency Variants to Detect Both Main Effects and Gene-Gene Interaction Effects in Case-Control Studies. | Chung RH et al. | β | 2017 | β |
| A POWERFUL METHOD FOR INCLUDING GENOTYPE UNCERTAINTY IN TESTS OF HARDY-WEINBERG EQUILIBRIUM. | Beck A et al. | β | 2017 | β |
| Causal relationship between obesity and serum testosterone status in men: A bi-directional mendelian randomization analysis. | Eriksson J et al. | β | 2017 | β |
| Evaluation of the accuracy of imputed sequence variant genotypes and their utility for causal variant detection in cattle. | Pausch H et al. | β | 2017 | β |
| Genome-wide association study of prostate-specific antigen levels identifies novel loci independent of prostate cancer. | Hoffmann TJ et al. | β | 2017 | β |
| Inbred Strain Variant Database (ISVdb): A Repository for Probabilistically Informed Sequence Differences Among the Collaborative Cross Strains and Their Founders. | Oreper D et al. | β | 2017 | β |
| Low-depth genotyping-by-sequencing (GBS) in a bovine population: strategies to maximize the selection of high quality genotypes and the accuracy of imputation. | Brouard JS et al. | β | 2017 | β |
| Performance Gains in Genome-Wide Association Studies for Longitudinal Traits via Modeling Time-varied effects. | Ning C et al. | β | 2017 | β |
| A multi-trait meta-analysis with imputed sequence variants reveals twelve QTL for mammary gland morphology in Fleckvieh cattle. | Pausch H et al. | β | 2016 | β |
| Fine-Mapping of Common Genetic Variants Associated with Colorectal Tumor Risk Identified Potential Functional Variants. | Du M et al. | β | 2016 | β |
| The projack: a resampling approach to correct for ranking bias in high-throughput studies. | Zhou YH et al. | β | 2016 | β |
| A large multiethnic genome-wide association study of prostate cancer identifies novel risk variants and substantial ethnic differences. | Hoffmann TJ et al. | β | 2015 | β |
| A multi-trait meta-analysis with imputed sequence variants reveals twelve QTL for mammary gland morphology in Fleckvieh cattle | Pausch H et al. | β | 2015 | β |
| Associations of common variants in the BST2 region with HIV-1 acquisition in African American and European American people who inject drugs. | Hancock DB et al. | β | 2015 | β |
| Fine-mapping additive and dominant SNP effects using group-LASSO and fractional resample model averaging. | Sabourin J et al. | β | 2015 | β |
| Genome-wide association study based on multiple imputation with low-depth sequencing data: application to biofuel traits in reed canarygrass. | Ramstein GP et al. | β | 2015 | β |
| Genotype-Based Score Test for Association Testing in Families. | Uh HW et al. | β | 2015 | β |
| Interaction association analysis of imputed SNPs in case-control and follow-up studies. | Subirana I et al. | β | 2015 | β |
| Jackknife-based gene-gene interactiontests for untyped SNPs. | Song M | β | 2015 | β |
| Multiple breast cancer risk variants are associated with differential transcript isoform expression in tumors. | Caswell JL et al. | β | 2015 | β |
| Polygenic risk, stressful life events and depressive symptoms in older adults: a polygenic score analysis. | Musliner KL et al. | β | 2015 | β |
| Scanning and Filling: Ultra-Dense SNP Genotyping Combining Genotyping-By-Sequencing, SNP Array and Whole-Genome Resequencing Data. | Torkamaneh D et al. | β | 2015 | β |
| SNP imputation bias reduces effect size determination. | Khankhanian P et al. | β | 2015 | β |
| When Does Choice of Accuracy Measure Alter Imputation Accuracy Assessments? | Ramnarine S et al. | β | 2015 | β |
| A general efficient and flexible approach for genome-wide association analyses of imputed genotypes in family-based designs. | Cobat A et al. | β | 2014 | β |
| Association studies with imputed variants using expectation-maximization likelihood-ratio tests. | Huang KC et al. | β | 2014 | β |
| fcGENE: a versatile tool for processing and transforming SNP datasets. | Roshyara NR et al. | β | 2014 | β |
| Gene-based rare allele analysis identified a risk gene of Alzheimer's disease. | Kim JH et al. | β | 2014 | β |
| Genome-wide association and admixture analysis of glaucoma in the Women's Health Initiative. | Hoffmann TJ et al. | β | 2014 | β |
| Genome-wide association study identifies a new SMAD7 risk variant associated with colorectal cancer risk in East Asians. | Zhang B et al. | β | 2014 | β |
| Harmonization of study and reference data by PhaseLift: saving time when imputing study data. | Gorski M et al. | β | 2014 | β |
| Impact of pre-imputation SNP-filtering on genotype imputation results. | Roshyara NR et al. | β | 2014 | β |
| Improved imputation quality of low-frequency and rare variants in European samples using the 'Genome of The Netherlands'. | Deelen P et al. | β | 2014 | β |
| Power of family-based association designs to detect rare variants in large pedigrees using imputed genotypes. | Saad M et al. | β | 2014 | β |
| Whole-exome imputation of sequence variants identified two novel alleles associated with adult body height in African Americans. | Du M et al. | β | 2014 | β |
| A comprehensive SNP and indel imputability database. | Duan Q et al. | β | 2013 | β |
| A generalized Kruskal-Wallis test incorporating group uncertainty with application to genetic association studies. | Acar EF et al. | β | 2013 | β |
| A genome-wide association study identifies 2 susceptibility Loci for Crohn's disease in a Japanese population. | Yamazaki K et al. | β | 2013 | β |
| APOA5 genotype influences the association between 25-hydroxyvitamin D and high density lipoprotein cholesterol. | Vimaleswaran KS et al. | β | 2013 | β |
| Causal relationship between obesity and vitamin D status: bi-directional Mendelian randomization analysis of multiple cohorts. | Vimaleswaran KS et al. | β | 2013 | β |
| Common and rare von Willebrand factor (VWF) coding variants, VWF levels, and factor VIII levels in African Americans: the NHLBI Exome Sequencing Project. | Johnsen JM et al. | β | 2013 | β |
| Genetic association analysis and meta-analysis of imputed SNPs in longitudinal studies. | Subirana I et al. | β | 2013 | β |
| Genotype imputation accuracy in a F2 pig population using high density and low density SNP panels. | GualdrΓ³n Duarte JL et al. | β | 2013 | β |
| Imputation across genotyping arrays for genome-wide association studies: assessment of bias and a correction strategy. | Johnson EO et al. | β | 2013 | β |
| Methods of tagSNP selection and other variables affecting imputation accuracy in swine. | Badke YM et al. | β | 2013 | β |
| Optimal methods for using posterior probabilities in association testing. | Liu K et al. | β | 2013 | β |
| Testing for rare variant associations in the presence of missing data. | Auer PL et al. | β | 2013 | β |
| The use of imputed sibling genotypes in sibship-based association analysis: on modeling alternatives, power and model misspecification. | MinicΔ CC et al. | β | 2013 | β |
| 1000 Genomes-based imputation identifies novel and refined associations for the Wellcome Trust Case Control Consortium phase 1 Data. | Huang J et al. | β | 2012 | β |
| Assessment of genotype imputation performance using 1000 Genomes in African American studies. | Hancock DB et al. | β | 2012 | β |
| Association of genetic variants for colorectal cancer differs by subtypes of polyps in the colorectum. | Zhang B et al. | β | 2012 | β |
| Computational tools for discovery and interpretation of expression quantitative trait loci. | Wright FA et al. | β | 2012 | β |
| Correction for population stratification in random forest analysis. | Zhao Y et al. | β | 2012 | β |
| Genome-wide association study of three-dimensional facial morphology identifies a variant in PAX3 associated with nasion position. | Paternoster L et al. | β | 2012 | β |
| Incorporating genotype uncertainties into the genotypic TDT for main effects and gene-environment interactions. | Taub MA et al. | β | 2012 | β |
| Recent developments in statistical methods for detecting genetic loci affecting phenotypic variability. | RΓΆnnegΓ₯rd L et al. | β | 2012 | β |
| Reprioritizing genetic associations in hit regions using LASSO-based resample model averaging. | Valdar W et al. | β | 2012 | β |
| SEQCHIP: a powerful method to integrate sequence and genotype data for the detection of rare variant associations. | Liu DJ et al. | β | 2012 | β |
| Assessing the impact of non-differential genotyping errors on rare variant tests of association. | Powers S et al. | β | 2011 | β |
| Association study of Nogo-related genes with schizophrenia in a Japanese case-control sample. | Jitoku D et al. | β | 2011 | β |
| Detecting major genetic loci controlling phenotypic variability in experimental crosses. | RΓΆnnegΓ₯rd L et al. | β | 2011 | β |
| Genotype imputation with thousands of genomes. | Howie B et al. | β | 2011 | β |