Our final data cleaning step is to perform preliminary association tests and then examine QQ, Manhattan signal, regional association and genotype cluster plots. We use logistic regression and likelihood ratio tests for case-control studies, using samples filtered by quality criteria and retaining unrelated subjects. Initially, we select which of the following covariates to include in the model: age, sex, recruitment center and the first several eigenvectors from the PCA. These potential covariates are analyzed in models that exclude genotype and those with significant effects are included in the final model. We then include the genotype (coded for an additive model) for each SNP in turn and test for SNP effects with a likelihood ratio test. We recommend examination of cluster plots for the ‘top hits’ (most significant SNPs) in an association study and flag results for any SNPs that show poor clustering. Examples of QQ and cluster plots are illustrated for the Addiction study in Figure S15. Another check on the quality of top hits is to examine Manhattan signal and regional association plots of association test p-value versus chromosomal