Investigating population stratification and admixture using eigenanalysis of dense genotypes.
- Authors
- Shriner, D
- Year
- 2011
- Journal
- Heredity
- PMID
- 21448230
- DOI
- 10.1038/hdy.2011.26
- PMCID
- PMC3128175
Principal components analysis of genetic data is used to avoid inflation in type I error rates in association testing due to population stratification by covariate adjustment using the top eigenvectors and to estimate cluster or group membership independent of self-reported or ethnic identities. Eigendecomposition transforms correlated variables into an equal number of uncorrelated variables. Numerous stopping rules have been developed to identify which principal components should be retained. Recent developments in random matrix theory have led to a formal hypothesis test of the top eigenvalue, providing another way to achieve dimension reduction. In this study, I compare Velicer's minimum average partial test to a test on the basis of Tracy-Widom distribution as implemented in EIGENSOFT, the most widely used implementation of principal components analysis in genome-wide association analysis. By computer simulation of vicariance on the basis of coalescent theory, EIGENSOFT systematically overestimates the number of significant principal components. Furthermore, this overestimation is larger for samples of admixed individuals than for samples of unadmixed individuals. Overestimating the number of significant principal components can potentially lead to a loss of power in association testing by adjusting for unnecessary covariates and may lead to incorrect inferences about group differentiation. Velicer's minimum average partial test is shown to have both smaller bias and smaller variance, often with a mean squared error of 0, in estimating the number of principal components to retain. Velicer's minimum average partial test is implemented in R code and is suitable for genome-wide genotype data with or without population labels.
Genealogical representation of the coalescent simulations. (a) Two populations with a single divergence event 2tNe generations ago. (b) Three populations with the first divergence event 2t1Ne generations ago and the second divergence event 2t2Ne generations ago.
Representative projections of simulated data for two populations. (aβc) The divergence event between populations A (red circles) and B (blue circles) occurred 0 generations ago. (dβf) The divergence event occurred 2Ne generations ago. (a, d) Analysis of populations A and B. (b, e) Analysis of admixed individuals (gray circles) with average individual admixture proportions 78.2% population A and 21.8% population B. (c, f) Combined analysis of admixed individuals, population A and population B.
Representative projections of simulated data for three populations. (aβc) The divergence event between populations B (blue circles) and C (black circles) occurred 0.0002Ne generations ago and the divergence of population A (red circles) occurred 0.002Ne generations ago. (dβf) The divergence event between populations B and C occurred 2Ne generations ago and the divergence of population A occurred 20Ne generations ago. (a, d) Analysis of populations A, B and C. (b, e) Analysis of admixed individuals (gray circles) with average individual admixture proportions 10% population A, 45% population B and 45% population C. (c, f) Combined analysis of admixed individuals, population A, population B and population C.
Top 16 principal components for the Howard University Family Study data using EIGENSOFT. All 16 principal components are statistically significant according to TracyβWidom statistics. The bottom right panel shows the scree plot.
Top 16 principal components for the Howard University Family Study data using Velicer's minimum average partial test. Only the top principal component is statistically significant. The bottom right panel shows the scree plot.
No chunks β full text not yet ingested.
No entities extracted from this document yet.
No uploaded files.
No citations found.
In this knowledge base
| Title | Year | PMID |
|---|---|---|
| On the association of common and rare genetic variation influencing body mass index: a combined SNP and CNV analysis. | 2014 | 24884913 |
External
| Title | Authors | Journal | Year | Link |
|---|---|---|---|---|
| Population Genomics Reveals Local Adaptation Related to Temperature Variation in Two Stream Frog Species: Implications for Vulnerability to Climate Warming. | Forester BR et al. | β | 2025 | β |
| Impact of Bronchiectasis on COPD Severity and Alpha-1 Antitrypsin Deficiency as a Risk Factor in Individuals with a Heavy Smoking History. | Izquierdo M et al. | β | 2023 | β |
| Whole-genome resequencing data support a single introduction of the invasive white pine sawfly, Diprion similis. | Davis JS et al. | β | 2023 | β |
| Whole-genome resequencing data support a single introduction of the invasive white pine sawfly, <i>Diprion similis</i> | Davis JS et al. | β | 2022 | β |
| Analyses of genome wide association data, cytokines, and gene expression in African-Americans with benign ethnic neutropenia. | Charles BA et al. | β | 2018 | β |
| Genome-wide mapping of quantitative trait loci in admixed populations using mixed linear model and Bayesian multiple regression analysis. | Toosi A et al. | β | 2018 | β |
| Inferring Population Structure and Admixture Proportions in Low-Depth NGS Data. | Meisner J et al. | β | 2018 | β |
| Ancestry-specific and sex-specific risk alleles identified in a genome-wide gene-by-alcohol dependence interaction study of risky sexual behaviors. | Polimanti R et al. | β | 2017 | β |
| Congruent population structure across paralogous and nonparalogous loci in Salish Sea chum salmon (Oncorhynchus keta). | Waples RK et al. | β | 2017 | β |
| Mapping the genomic architecture of adaptive traits with interspecific introgressive origin: a coalescent-based approach. | Hejase HA et al. | β | 2016 | β |
| SNP-based heritability estimates of the personality dimensions and polygenic prediction of both neuroticism and major depression: findings from CONVERGE. | Docherty AR et al. | β | 2016 | β |
| APOL1 G1 genotype modifies the association between HDLC and kidney function in African Americans. | Bentley AR et al. | β | 2015 | β |
| F-box/LRR-repeat protein 7 is genetically associated with Alzheimer's disease. | Tosto G et al. | β | 2015 | β |
| Genome-wide association studies in Africans and African Americans: expanding the framework of the genomics of human traits and disease. | Peprah E et al. | β | 2015 | β |
| Mango (Mangifera indica L.) germplasm diversity based on single nucleotide polymorphisms derived from the transcriptome. | Sherman A et al. | β | 2015 | β |
| Phenotypic variance explained by local ancestry in admixed African Americans. | Shriner D et al. | β | 2015 | β |
| Gene-based sequencing identifies lipid-influencing variants with ethnicity-specific effects in African Americans. | Bentley AR et al. | β | 2014 | β |
| Genome-wide genotype and sequence-based reconstruction of the 140,000 year history of modern human ancestry. | Shriner D et al. | β | 2014 | β |
| On the association of common and rare genetic variation influencing body mass index: a combined SNP and CNV analysis. | Peterson RE et al. | β | 2014 | β |
| Pharmacogenomics, ancestry and clinical decision making for global populations. | Ramos E et al. | β | 2014 | β |
| Two genomic regions contribute disproportionately to geographic differentiation in wild barley. | Fang Z et al. | β | 2014 | β |
| Genome-wide association mapping for wood characteristics in Populus identifies an array of candidate single nucleotide polymorphisms. | Porth I et al. | β | 2013 | β |
| Breed-specific ancestry studies and genome-wide association analysis highlight an association between the MYH9 gene and heat tolerance in Alaskan sprint racing sled dogs. | Huson HJ et al. | β | 2012 | β |
| Improved eigenanalysis of discrete subpopulations and admixture using the minimum average partial test. | Shriner D | β | 2012 | β |
| Multiple loci associated with renal function in African Americans. | Shriner D et al. | β | 2012 | β |
| Variation in APOL1 Contributes to Ancestry-Level Differences in HDLc-Kidney Function Association. | Bentley AR et al. | β | 2012 | β |
| Joint ancestry and association testing in admixed individuals. | Shriner D et al. | β | 2011 | β |