Chunk #14 — Measures — Analytic strategy.

Source: Clinical, environmental, and genetic risk factors for substance use disorders: characterizing combined effects across multiple cohorts.
Embedded: yes

Text

We estimated a series of nested logistic regression models with the pooled data: (1) a baseline model (sex, age, and cohort), (2) a genetic risk model (baseline + PGS), (3) a clinical/environmental risk model (baseline + CERI), and (4) a combined risk model (baseline + PGS + CERI). Because COGA and FT12 included a large number of related individuals, we adjusted for familial clustering using cluster-robust standard errors [56]. To assess the predictive accuracy of each model, we took the difference in pseudo-R2(ΔPseudo-R2) [57], between the baseline and corresponding models. Finally, we calculated the discriminatory power of the combined model using the area under the curve (AUC) from a receiver operating characteristic (ROC) curve. We included a variety of robustness checks to ensure that no single cohort in the IDA was unduly influencing the results. Our analytic strategy was preregistered on the Open Science Framework (https://osf.io/etbw8). Deviations from the preregistration are described in the supplementary information (section 6).