In each dataset, association between heavy versus light smoking based on cigarettes per day and all SNPs was evaluated with logistic regression models as the primary analysis. Genotypes were coded additively as the number of non-reference alleles, where the reference allele was defined as the major allele in the European ancestry population in dbSNP [Sherry, et al. 2001]; consistency of allelic coding was confirmed by comparing allele labels and allele frequencies across all datasets within each population. Age as a continuous variable and gender were included as covariates. Secondary analyses of the 4-level cigarettes per day trait used linear regression models with the same covariates, assuming that the trait has a simple linear relationship with the predictors.