Chunk #25 — Results

Source: Population structure and eigenanalysis.
Embedded: yes
Text

Cavalli-Sforza [15, pp. 39–42] gives an explanation of why PCA can be expected to reveal population structure. We give a different explanation, oriented towards analysis at the individual level. If e [1] is the principal eigenvector of the matrix X, this means that the sum of squares is maximized over all vectors with constant norm. The second eigenvector e [2] maximizes the same expression with the constraint that e [1], e [2] are orthogonal, and so on. Why would we expect this to reveal population structure? Suppose that in our sample, we have just two populations and that each is homogeneous. Choose a vector with coordinates constant and positive for samples from one population, and coordinates constant and negative for samples from the other. Arrange so that the vector coordinates sum to zero. Then, since alleles within a population will tend to agree more than in the sample as a whole, the quantity S of Equation 4 will tend to be large. This is exactly what we maximize as a function of the vector e. More generally, if we have