Chunk #49 — Results — Scenario B — Modeling considerations

Source: A flexible and accurate genotype imputation method for the next generation of genome-wide association studies.
Embedded: yes

Text

A full representation of Scenario B would include all HapMap SNPs that are polymorphic in the CEU panel. There were 138,592 such SNPs in our dataset, with 44,875 of these belonging to set U2 and the remaining 93,717 to set U1. This data structure is problematic for most imputation methods because their modeling strategies are premised on a single reference panel in which most genotypes have been observed (i.e., some version of Scenario A). If the data from both reference panels in Scenario B were combined into a single panel, many reference SNPs (those in U1) would be missing large proportions of their genotypes, which could substantially decrease imputation accuracy in the study sample. Ad-hoc modifications of these approaches are not attractive either. For example, it would be possible for such methods to impute SNPs in U1 in the diploid reference panel and then combine the observed and imputed genotypes to impute SNPs in U1 and U2 in the study sample, but failing to account for the uncertainty in the imputed reference genotypes would probably lead to overconfident and lower-quality