Our work substantially extends the current literature on behavioral phenotypes extracted from EHR data for use in GWAS and other genetic studies. The most common approach to defining a harmful alcohol use phenotype has been the diagnosis of AUD based on the Diagnostic and Statistical Manual of Mental Disorders (DSM). Although DSM diagnoses of AUD have clinical utility, they are time consuming and costly to obtain, which limits their availability for large-scale genetic studies. The use of the ICD administrative codes as a proxy for clinical diagnosis is common for many medical conditions (12), but ICD codes are insensitive and non-specific measures of complex behaviors such as harmful alcohol use (4, 26). Importantly, our results suggest that these codes are applied differentially by race, with EAs being substantially less likely to have the code applied than AAs [despite AAs, until recently, having a lower frequency of AUD in the general population (25)]. These limitations may explain why longitudinal AUDIT-C metrics were more strongly associated with the ADH1B variants than were diagnostic codes and the inclusion of the latter only modestly improved the association. The large size of our sample made it possible to demonstrate these effects in both populations.