We used PLINK 1.90 to conduct the genome-wide association analysis for each substance use disorder phenotype, using logistic regression, adjusted for age, sex and the top 5 principal components(Purcell et al., 2007). The genome-based restricted maximum likelihood (GREML) method implemented in GCTA was used to estimate the percentage of variance explained by common SNPs in each trait(Lee et al., 2012; Yang et al., 2010). Linkage Disequilibrium Score Regression (LDSC) was used to calculate the genetic correlations which is minimally biased by sample overlap(Bulik-Sullivan et al., 2015). The summary statistics of other published traits were obtained from LD Hub of Broad Institute (http://ldsc.broadinstitute.org). GCTA and LDSC were used to calculate the genetic correlation between OUD and traits within the Partners Biobank and external traits from LD Hub, respectively.