In order to remove non-random missingness (SNP failures correlating with individual datasets and experimental error rather than random failures) a chi-squared test was carried out on missingness per SNP between datasets. The SNPs with the top 5% of the resultant p-values were removed. In addition, in order to remove possible inflation of HBD from samples with non-European ancestry, only samples that clustered tightly with other European samples in the principle components analysis were included (Supplementary Figure 8). Consequently, 1472 case and 5380 controls were used in the Beagle analysis. Beagle was run using the default settings, and memory requirements dictated that the control cohorts were run as separate datasets. HBD burden was obtained by calculating the proportion of individuals that had one or more regions of HBD above a number of different size thresholds in each dataset. P-values for these thresholds were obtained by carrying out a Fishers Exact Test.