paperKB
coga / coga-kb
Help
Sign in

Chunk #19 — METHODS — NUMBER OF DIMENSIONS

Source
Discovering genetic ancestry using spectral graph theory.
Embedded
yes

Text

In a more realistic situation the between-cluster similarity will not be exactly 0 and all components of the graph will be connected. Hence only the trivial eigenvalue (ν0) will be zero corresponding to the null eigenvector u0=(d11/2,…,d61/2)t. Nevertheless, if the clusters are fairly distinct, we can still use the eigenvalues of the graph Laplacian to determine the number of significant dimensions. Heuristically, choose the number of eigenvectors d+1 such that the eigengaps δi = |νi − νi−1| are small for i≤d but the eigengap δd+1 is large. One can justify such an approach with an argument from perturbation analysis [Stewart, 1990]. The idea is that the matrix ℒ for the genetic data is a perturbed version of the ideal matrix for d disconnected clusters. If the perturbation is not too large and the “non-null” eigengap δd is large, the first d eigenvectors will be close to the ideal indicator vectors and a spectral clustering algorithm will separate the individual clusters well. The question then becomes: How do we decide whether an eigengap is significant (non-null)?