For the European sub-cohort, we computed s.e.m. over samples as before; for the other three sub-cohorts, we computed s.e.m. over 25 SNP blocks due to the small numbers of trios. When used to compare methods, these standard errors are conservative due to true variation among samples and across the genome (which causes errors to be correlated). We therefore assessed statistical significance of differences in performance between pairs of methods by performing one-sided binomial tests across samples or SNP blocks as appropriate.