Chunk #5 — Methodological Considerations — Inferring Population Structure

Source: Genome-wide Association Studies in Ancestrally Diverse Populations: Opportunities, Methods, Pitfalls, and Recommendations.
Embedded: yes

Text

For cohorts with diverse ancestral backgrounds, we can estimate population structure based on genome-wide data. Currently the most common tool for estimating continuous population structure is principal component analysis (PCA); a listing of other approaches is included in Supplemental Methods II. PCA is a statistical method for reducing the complexity of high-dimensional data (e.g., thousands of measured variants across the genome) into orthogonal axes (principal components, PCs) that explain the largest fraction of variability in the data. The spread of data across these axes provides a visual guide to sub-structure among samples; when data points are estimated from each individual’s genetic markers, the PCs illustrate population structure. These PCs can be computed within the cohort, or can be estimated from an external reference (e.g., The 1000 Genomes Project (1KGP); (Sudmant et al., 2015)) and the GWAS sample can be projected onto the PC axes to allow comparison with the ancestries of known reference populations (Peterson et al., 2017b). However, the latter approach can be limited by the number and diversity of populations represented on the reference panel, highlighting the need