genotyped SNPs with one another. It is likely that the causal variants and the SNPs have different properties, so LD among SNPs is only a guide to LD between causal variants and SNPs. One way in which the causal variants may differ from the SNPs is in MAF. To investigate how the difference between Gjk and Ajk depends on the number of SNPs used and the MAF of the causal variants, we randomly sampled five sets of SNPs (50K, 100K, …, 250K, where K = 1,000) in the adult dataset and ten sets of SNPs in the adolescent dataset (50K, 100K, .., 500K). For each SNP set, we randomly split the SNPs into two groups, the first representing SNPs and the second representing causal variants, and estimated genetic relationships using all of the SNPs in the first group (Ajk) and using SNPs with MAF ≤ θ in the second group (proxy for Gjk), where θ = 0.1, 0.2, 0.3, 0.4 or 0.5. We calibrated the prediction error by calculating the regression of Gjk on Ajk. We established empirically that the regression coefficient β=1−(c+1/N)var(Ajk) (Fig. 1), where N is the number of SNPs used to calculate Ajk and the term in