Chunk #13 — Methods — Genetic association and heterogeneity analyses

Source: In search of causal variants: refining disease association signals using cross-population contrasts.
Embedded: yes

Text

Our primary single SNP association analyses of case-control status use logistic regression models, implemented using SAS (Cary, NC). To analyze the combined samples from two different population groups, terms are included in the base model to denote population source (s) and to correct for any necessary covariates (e.g. gender), denoted by variables xi, i = 1, ... n. In general, the non-genetic base model is then: ln⁡(P1−P)=α0+α1x1+...+αnxn+β1s, where P is the probability of being a case and s is sample race/ethnicity. In our particular application to the cocaine dependence data, we included two covariates (n = 2), gender (0 = male, 1 = female) and year of birth, and the two populations are EA (s = 0) and AA (s = 1) from self-report. Genotype status G at each marker is modeled log-additively (multiplicatively) and coded as the number of copies of the minor allele in the European-American sample; this coding choice is arbitrary but allows for consistent reporting. The full model includes both genotype (β2*G) and genotype-by-population (β3*G*s) terms: ln⁡(P1−P)=α0+α1x1+...+αnxn+β1s+β2G+β3Gs and we test for significance of genetic effect by