In each dataset, associations between the loci and the traits were evaluated using logistic regression. Our primary analysis model coded genotypes additively as the number of copies of the minor allele according to the HapMap CEU reference population. This allele is referred to as the “coded allele” (C) and the major allele is referred to as the “reference allele” (R). To confirm the appropriateness of the additive model, for each locus a 2 degree of freedom model including the additive term and a heterozygote deviation term was evaluated. The analyses of the 4-level CPD trait used generalized logistic regression to obtain separate effect estimates (beta coefficients) for each category with respect to the lowest smoking category as the referent. All these association analyses included sex and age as covariates. In addition, lung cancer and COPD analyses included categorical cigarettes-per-day as an unordered covariate.