After creating polygenic scores in each of the validation samples, we used these scores to predict the alcohol problems outcomes by fitting a series of linear models, using the lme4 package (Bates et al., 2015) in R version 3.2.1 (R Core Team, 2015). Scores from the best p value threshold selected by PRSice in each discovery-validation pair were entered into a regression model, along with the same covariates as in the respective discovery GWAS, with the alcohol problems measure as the dependent variable. All polygenic scores were approximately normally distributed with a mean of zero and a range between 0.0003 and 0.126 units. To account for non-independent observations in family-based samples, we used a linear mixed model framework with the family unit as a random effect. We used a generalized linear model (lm/lmer functions) to predict the alcohol problems factor score in ALSPAC and the alcohol dependence symptom counts in FT12, COGA, and IASPSAD (log+1 transformed to adjust for the zero-inflated distributions). To account for multiple testing, we performed a Bonferroni p-value correction by dividing α=.05 by the number of