paperKB
coga / coga-kb
Help
Sign in

Chunk #17 — METHODS — NUMBER OF DIMENSIONS

Source
Discovering genetic ancestry using spectral graph theory.
Embedded
yes

Text

The graph Laplacian has several properties that make it useful for cluster analysis. Both its eigenvalues and eigenvectors reflect the connectivity of the data. Consider, for example, the normalized graph Laplacian where the sample consists of d distinct clusters. Sort the eigenvalues 0 = ν0 ≤ ν1 ≤ ⋯ ≤ νN−1 of ℒ in ascending order. The matrix ℒ has several key properties [Chung, 1992]: (i) The number d of eigenvalues equal to 0 is the number of connected components S1, …, Sd of the graph. (ii) The first positive eigenvalue νd reflects the cohesiveness of the individual components; the larger the eigenvalue νd the more cohesive the clusters. (iii) The eigenspace of 0 (i.e., the vectors corresponding to eigenvalues equal to 0) is spanned by the rescaled indicator vectors D1/21Sk, where 1Sk = 1 if i ∈ Sk, and 1Sk = 0 otherwise. It follows from (iii) that for the ideal case where we have d completely separate populations (and the node degrees are similar), individuals from the same population map into the same point in an embedding defined