The discrepancy between simulations and real data could be an effect of increased similarity across haplotypes inferred from the 1000 Genomes Project phase 1 data due to the genotype calling and phasing procedure from 4x sequencing data that aggregated information across samples (Supplementary Note, Supplementary Table 6). Other possible explanations include nonuniform error rates in base-calling and alignment of reads across the genome or simulation parameters that do not perfectly model aspects of the empirical data such as variance in coverage across samples and loci, although our experiments suggest that these are unlikely to be the primary explanation (Supplementary Note).