Variant-level P values were generated adopting a Fisher’s exact two-sided test. Three distinct genetic models were studied for binary traits: allelic (A versus B allele), dominant (AA + AB versus BB) and recessive (AA versus AB + BB), where A denotes the alternative allele and B denotes the reference allele. For quantitative traits, we adopted a linear regression (correcting for age, sex and age × sex) and replaced the allelic model with a genotypic (AA versus AB versus BB) test. For ExWAS analysis, we used a significance cut-off of P ≤ 2 × 10−9. To support the use of this threshold in this study, we performed an n-of-1 permutation on the binary and quantitative trait dominant model ExWAS. Only 18 of 38.7 billion permuted tests had P ≤ 2 × 10−9, and 58 of 38.7 billion permuted tests had P values less than a more liberal cut-off of 1 × 10−8 (Supplementary Tables 18, 19). At this conservative P ≤ 2 × 10−9 threshold, the expected number of ExWAS PheWAS false positives is 18 out of the 46,947 observed significant associations.