Chunk #5 — Materials and Methods — Data cleaning

Source: Genome-wide search for replicable risk gene regions in alcohol and nicotine co-dependence.
Embedded: yes

Text

Before statistical analysis, we strictly cleaned the phenotype data and then the genotype data [Zuo et al., 2011a; Zuo et al., In revision]. After cleaning, 805,814 markers in EAs and 895,714 markers in AAs were included for association analysis. The cleaned data had high-quality, as evidenced by the following: (1) The homogeneity of the two samples was very high; that is, EAs and AAs were well differentiated. (2) The observed and expected p-values for the associations fit very well within EAs or AAs (see QQ plots in Supplemental Figure S1). (3) We also computed from these p-values a low genomic inflation factor (GIF) of 1.04 in EAs, 1.02 in AAs and 1.04 in meta-analysis.