Chunk #4 — Population Stratification

Source: Critical Issues in the Inclusion of Genetic and Epigenetic Information in Prevention and Intervention Trials.
Embedded: yes

Text

even in samples thought to be homogeneous (Burton et al., 2007; Freedman et al., 2004). Several methods, including genomic control (Devlin & Roeder, 1999) and STRUCTURE (Pritchard & Rosenberg, 1999; Pritchard & Donnelly, 2001) are available to deal with this issue by either correcting the test statistic for the average level of stratification or a priori grouping of population subsets. At the onset of the GWAS era, the wealth of genome-wide data gave rise to additional approaches that rely on the correlation structure of genetic information to identify cryptic population structure. EIGENSTRAT (Price et al., 2006) is an example of an approach that uses genome-wide data to infer principal components of population membership. Subjects are assigned a score for each of these principal components representing their membership in a given population cluster. These scores can then be used in all subsequent analyses to account for population structure. This process is automated in PLINK 2 and the cluster PC scores are included as covariates in subsequent analyses to account for population stratification (Chang et al., 2015). In practice, to reduce the computational intensity and avoid using redundant information from SNPs in linkage disequilibrium (i.e., non-randomly associated) investigators frequently select a subset