For each trait, we conducted a genome-wide association test using the hurdle model [34] because of the discrete nature and excess zero values associated with the symptom counts. The hurdle regression model assumes the observed data are generated from two processes: one generates zero and the other generates positive values. Since our interest is in the severity of symptomatology, the p-values from the positive component of the hurdle model were used for further analysis. For each phenotype of addiction, the estimated p-values are summarized using QQ-plots in Fig. 3. The diagonal straight lines have the slope 1 and intercept 0. When the curve of the p-values deviate far away from the diagonal line, it indicates that there are many SNPs significantly associated with the corresponding phenotype trait. By examining the 4 plots in Fig. 3, we obtain the following two findings: (1) For the nicotine symptoms, the p-values fall on the diagonal line and this indicates that there is no SNP associated with nicotine symptoms; (2) For the symptoms of the other three substances, all the p-values deviate from the