We ran a pairwise identity by descent (IBD) analysis within and across the combined dataset to detect duplicate and related individuals based on resulting IBD probabilities Z0, Z1 and Z2 (Zk is probability that a pair of subjects share k alleles identical by descent, estimated from genome-wide SNP data). If 0≤Z0≤0.1 and 0≤Z1≤0.1 and 0.9≤Z2≤1.1 then a pair was flagged as being identical twins or duplicates. Pairs were considered full siblings if 0.17≤Z0≤0.33 and 0.4≤Z1≤0.6 and 0.17≤Z2≤0.33. Half siblings or avunculars were defined as having 0.4≤Z1≤0.6 and 0≤Z2≤0.1. Some of the duplicates flagged were expected, having been genotyped in multiple datasets and hence having the same cohort identifiers. In this case, one of each pair was randomly chosen for removal from the dataset. In instances where pairs showed pairwise genotype concordance rate>0.999 but were not expected duplicates, both individuals were removed. Related individuals (full siblings, half siblings/avunculars) were not removed from the final datasets. In the HumanHap dataset, 107 individuals were removed because they were duplicates or flagged for removal in the genotyping step, leaving 6,787 subjects. In addition, 8