Chunk #15 — Methods — Statistical Analyses

Source: The aggregate effect of dopamine genes on dependence symptoms among cocaine users: cross-validation of a candidate system scoring approach.
Embedded: yes

Text

Our sample was split randomly into training and testing samples of equal size. Although the psychometric properties of and endorsement rates for substance problems may vary substantially between sexes (Nichol et al., 2007) and races (Harford et al., 2009), the random splitting of our sample resulted in demographically equivalent sub-samples (see Table 1). Given the heterogeneity within the sample, all substance dependence symptom counts were residualized over sex, age (coded in quartiles as three dummy codes, as has been done in previous SAGE analyses, corresponding to ≤34, 35-39, and 40-44, with 45+ as the reference group; Bierut et al., 2010), primary study source (dummy codes, corresponding to COGA and COGEND, with FSCD as the reference group), and ancestry. To account for sample ethnic heterogeneity, a procedure described by Price and colleagues (2006) was used to estimate ancestry in the form of principal components derived from the entire SAGE sample genome-wide data. This resulted in two major principal components, corresponding to European vs. African ancestry (PC1) and Hispanic vs. non-Hispanic ancestry (PC2) (Bierut et al., 2010). Both PC1 and PC2 were included as covariates over which symptom counts were residualized.