Chunk #5 — Population Stratification

Source: Critical Issues in the Inclusion of Genetic and Epigenetic Information in Prevention and Intervention Trials.
Embedded: yes

Text

included as covariates in subsequent analyses to account for population stratification (Chang et al., 2015). In practice, to reduce the computational intensity and avoid using redundant information from SNPs in linkage disequilibrium (i.e., non-randomly associated) investigators frequently select a subset of SNPs randomly across the genome to estimate stratification using the PCA approach. Although these may or may not be a priori identified ancestry information markers, it has been shown that “randomly” selected SNPs perform equally well (Montana & Pritchard, 2004). Several investigators (Choudhry et al., 2006; Sankararaman, Sridhar, Kimmel, & Halperin, 2008) have noted that differences in global (genome-wide) versus local (at a gene or LD block) ancestry exist, especially in admixed populations (e.g., African-American). Consequently, methods have been developed to estimate local ancestry, or the proportion of ancestry at each SNP attributable to known reference populations. This approach can be applied in a genome-wide context (WinPOP/LAMP; (Pasaniuc, Sankararaman, Kimmel, & Halperin, 2009; Pasaniuc et al., 2011; Sankararaman et al., 2008)) to estimate proportions of local ancestry at each region and those estimates used to account for stratification based on those mixing proportions in subsequent tests in specific candidate regions. Importantly, Keller (Keller, 2014), suggests that although including these