We followed Okbay et al.37 to compare the signs of the within-family estimates to the signs of the estimates from a GWAS meta-analysis that we re-ran after removing the sibling samples (N = 1,070,751). We benchmarked our observed fraction of concordant signs against the three theoretical benchmarks shown in Fig. 2. The theoretical benchmarks are calculated using posterior distributions for the GWAS effect sizes obtained from our Bayesian statistical framework. Treating each benchmark as a null hypothesis, we conducted one-sided binomial tests where the alternative hypothesis is that the observed sign concordance falls short of the benchmark. We conducted this test for sets of approximately independent SNPs selected at the P value thresholds 5×10−8, 5×10−5, and 5×10−3 (Supplementary Table 20 and Fig. 2).