Chunk #89 — Results — An Estimate for the Data Size Needed for Significance

Source: Population structure and eigenanalysis.
Embedded: yes

Text

and STRUCTURE on a “level playing field.” Our PCA methods return a leading eigenvector, while running STRUCTURE with K = 2 clusters, returns for each individual the probability of belonging to cluster 1. We used a nonparametric idea, applying a probit transform to both the output of both the PCA and of STRUCTURE, and then running an ANOVA analysis, both for PCA and STRUCTURE output. (The probit transform uses order statistics (ranks) to map the observations into points appropriate if the underlying distribution is standard normal. See, for example, [33].) This amounts to carrying out an unsupervised analysis and then checking to see if the recovered “structure” reflects the truth.