Future research may address some of the limitations of our study. Prior work has demonstrated that ICD codes have a low sensitivity for current tobacco use, but may have a reasonable specificity for this common behavior.67 Our results appeared to be robust to moderate levels of misclassification, particularly in controls, as detected by the pairing with self-reported questionnaire data. Our results also appeared to be robust to moderate levels of cross-cohort heterogeneity, including potential differences in diagnostic practices and different levels of misdiagnosis of control populations across sites. Although studies that systematically evaluate the effect of removing potentially misclassified individuals are needed, we chose not to remove them in this study because not all individuals had concomitant survey data available. This questionnaire data, along with other forms of EHR data (e.g., clinical notes), may help capture additional phenotypes, including the response to treatment or the ability to successfully quit smoking without formal treatment. We have highlighted potential differences of traits ascertained by ICD codes as a limitation of our study. rg results revealed high levels of association between TUD and