The Yale-Penn16,66 sample includes 11,332 genotyped and phenotyped individuals recruited across three phases (i.e., Yale-Penn 1, Yale-Penn 2, and Yale-Penn 3) based on the time of recruitment and genotyping array used. All cohorts were ascertained via recruitment at substance use treatment centers or targeted advertisements for genetic studies of cocaine, opioid, and alcohol dependence, resulting in a sample highly enriched for problematic substance use, as well as control subjects and relatives. All participants were assessed using the Semi-Structured Assessment for Drug Dependence and Alcoholism (SSADDA)67. Analyses based on Yale-Penn 1 and 2 have been published previously66, and were used in the discovery sample of the present study. Here, we used data from Yale-Penn 316 for replication analyses and as a target sample for polygenic risk score analyses; the Yale-Penn 3 sample is independent from our discovery GWASs. Yale-Penn 3 comprises 3,026 genotyped and phenotyped Americans of European (EUR; N=1,986) and African (AFR; N=1,040) ancestry passing standard quality control. Genotyping was performed at the Gelernter lab at Yale University using the Illumina Multi-ethnic Global Array containing 1,779,819 markers, followed by genotype imputation using Minimac368 and the Haplotype Reference Consortium reference panel69 as implemented on the Michigan imputation server (https://imputationserver.sph.umich.edu).