paperKB
coga / coga-kb
Help
Sign in

Chunk #20 — METHODS — NUMBER OF DIMENSIONS

Source
Discovering genetic ancestry using spectral graph theory.
Embedded
yes

Text

In this work, we propose a practical scheme for estimating the number of significant eigenvectors for genetic ancestry that is based on the eigengap heuristic and hypothesis testing. By simulation, we generate homogeneous data without population structure and study the distribution of eigengaps for the normalized graph Laplacian. Because there is only one population, the first eigengap δ1 is large. We are interested in the first null eigengap, specifically the difference δ2 = |ν2−ν1| between the first and second eigenvalues. If the data are homogeneous, this difference is relatively small. Based on our simulation results, we approximate the upper bound for the null eigengap with the 99th quantile of the sampling distribution as a function of N and L. We choose the number of dimensions d in the eigenvector representation according to d=max{i;δi>−0.00016+2.7/N+2.3/L}−1.