Our phenotypic analysis consisted of three consecutive components: data reduction, cluster analysis, and heritability estimation. As described in detail in Supplementary Materials, data on the 68 key clinical variables from 9,965 subjects were analyzed to derive subtypes of cocaine use and related behaviors. Briefly, we used multiple correspondence analysis (MCA) [Abdi and Valentin, 2007; Murtagh, 2007; LeRoux and Rouanet, 2009], a non-parametric method, to reduce the large number of variables to a limited number of dimensions. Cluster analysis was used to group similar subjects together based on the retained dimensions to create clusters of subjects. To estimate the heritability of each of the clusters, logistic regression was used to compute the likelihood of each subject’s membership in the cluster. Together with pedigree information, the log likelihood values for 9,436 EAs and AAs, including 2,268 individuals from 957 multi-member families, were analyzed using the Sequential Oligogenic Linkage Analysis Routines (SOLAR) program [Almasy and Blangero, 1998] to estimate the heritability of the cluster-derived trait, with sex, age and race as covariates.