A subtle feature of Figure 2 is that not all of the IMPUTE2 curves are monotonically increasing with khap: some of the black curves peak at intermediate values of this parameter, then steadily decay as khap grows to its maximum value of 2,020. This trend is clearest in the YRI panel, but it is also observable in other panels. There should be few problems of statistical computation (e.g. failure of the MCMC algorithm to converge) in our leave-one-out experiments, so we assume that this result reflects a real feature of the method. Our interpretation is that restricting the reference set via khap actually imposes a more appropriate prior distribution on the haplotype copying probabilities when there is significant population structure in the panel. Tuning this prior by changing khap has only a small effect on mean accuracy, which implies that our imputation method is largely robust to stratified reference data even without the surrogate family approximation. At the same time, this result implies that choosing custom reference panels may have benefits beyond just speeding up the computation, which is consistent with the conclusions of Pasaniuc et al. (2010).