paperKB
coga / coga-kb
Help
Sign in

Chunk #14 — METHODS — SPECTRAL CLUSTERING

Source
Discovering genetic ancestry using spectral graph theory.
Embedded
yes

Text

In graph-theoretic language, the goal of clustering is to find a partition of the graph so that the connections between different groups have low weight and the connections within a group have high weight. For two disjoint sets A and B of a graph, the cut across the groups is defined as cut(A, B) = ∑i∈A,j∈B wij. Finding the partition with the minimum cut is a well-studied problem; however, the minimum cut criterion favors separating individual vertices or “outliers” from the rest of the graph [Shi and Malik, 1997]. The normalized cut approach by Shi and Malik circumvents this problem by incorporating the volume or weight of the edges of a set into a normalized cost function Ncut(A, B) = cut(A, B)/vol(A) + cut(A, B)/vol(B), where vol(A) = ∑i∈A di and vol(B) = ∑i∈B di. This cost function is large when the set A or B is small. Our Spectral-GEM algorithm (below) exploits the fact that the eigenvectors of the graph Laplacian provide an approximate solution to the Ncut minimization problem. Smartpca [Patterson et al., 2006] and standard GEM [Luca et al., 2008], on the other hand, are biased toward embeddings that favor small, but tight, clusters in the data.