one of each pair of individuals with estimated genetic relatedness ≥0.05 and retained 53,991 unrelated individuals for analysis. Individual-level ICD-9 codes were not available in dbGaP but had been classified into 22 common diseases (Supplementary Table 4). The disease status was coded as 0 (unaffected) and 1 (affected). We added an additional trait “disease count” (a count of the number of diseases affecting each individual) as a crude measure of general health status of each individual. We then performed a genome-wide association analysis for each of the 23 phenotypes with age, gender, and the first 20 PCs fitted as covariates. The MHC region is often removed from the analysis in previous studies, mainly because of the complicated LD structure in this region. In this study, we did not remove this region because we use a set of near-independent SNPs as instruments after LD clumping.