To make fair comparisons about imputation accuracy across BEAGLE, MACH, and PLINK, we needed a summary statistic that we could apply consistently to each program's output. We chose to measure imputation accuracy in terms of genotype concordance, which we defined as the proportion of imputed and experimentally determined genotypes that matched, based on a discrete imputed call being the genotype with a posterior probability exceeding a user-defined threshold. To measure imputation yield, we first calibrated the threshold of posterior probability for each combination of program and reference panel in order to achieve concordance of 0.90, equivalent to an error rate of 10% (the “fixed error rate” approach). Based on this calibration, we assessed the imputation yield and coverage after filtering the imputation results based on a test of Hardy-Weinberg equilibrium to screen for genotype-specific imputation failure. This procedure is analogous to estimating power after controlling the false positive error rate in classical hypothesis testing.