We used an ad-hoc method for initial variant filtering which enabled us to identify variants that had been filtered out ‘quite often’ by our submitting studies. For each site and for each cohort, we labelled the site as “called” in that study if the putative calls from bcftools based on GLs exhibited more than one allele in that cohort, or “not called” if it showed no variation. We also used the haplotype sets provided by each study to determine whether each study had filtered out each site or not using their own internal calling pipeline. To determine a threshold of “number of times filtered out”, we stratified the sites according to their called status versus their filtered status (Supplementary Figure 5). We also measured the Ts/Tv ratio of the set of SNPs for each of these stratified combinations. SNPs corresponding to the cells above the red line in the figure were filtered out, removing all cells which had been filtered out by more than 4 studies or have Ts/Tv ratio less than 1.7.