Chunk #81 — Discussion — Modeling issues in imputation datasets

Source: A flexible and accurate genotype imputation method for the next generation of genome-wide association studies.
Embedded: yes

Text

In our view, these disparate observations point to a single underlying cause: joint modeling of untyped SNPs is generally ineffective, and it grows progressively worse as the space of missing genotypes expands. BEAGLE was competitive in our analyses, so its modeling strategy may have some merit, but it is also possible that BEAGLE's success came in spite of the joint modeling framework, not because of it. A better alternative might be to embed the same clustering model in a framework like the ones used by fastPHASE or IMPUTE v2. We suggest that further scrutiny be applied before a full joint model is used in general applications. Comparisons like ours, and others [13], are necessarily restricted to artificially small datasets, but we have shown that these “toy” datasets can mask problems that might occur in more realistic settings, which will often include larger amounts of missing data. In practice, the accuracy levels and running times achieved by BEAGLE in our study may represent best-case scenarios rather than standard results.