To obtain posterior probabilities and imputed genotypes, (Fig. 1) we used the software package fastPHASE [Scheet and Stephens, 2006]. For each simulated region, we fit the LD model to the reference chromosomes only, and then applied this fitted model to the pseudo individuals in the simulated cohort. (For convenience we set the number of haplotype clusters K to be 20.) We assess imputation accuracy with the square of the Pearson correlation coefficient between the true and best-guess genotypes (R2), which is more informative about power at different allele frequencies than a simple genotype imputation error rate measure. For our simulations, the median R2 for these data was 0.90 and the mean was 0.75.