Another question to consider when applying our framework is whether the optimal number of surrogate family haplotypes will change with different reference datasets. Judging from our experience in a variety of studies, we suggest the rule of thumb that khap should be set to the number of reference haplotypes that have broadly similar ancestry to the study population. For example, the broad ancestral groupings in HapMap 3 (Europe, East Asia, Africa) each include 500–800 haplotypes, and we found that khap = 500 worked well with this resource. Imputation accuracy is not highly sensitive to this variable, regardless of other factors like chunk size and local recombination rate, so it should not usually be necessary to optimize khap empirically. As reference sets grow and we further develop our approximation, we anticipate that it will be possible to achieve high accuracy with even lower values of khap.