paperKB
coga / coga-kb
Help
Sign in

Chunk #41 — DISCUSSION

Source
Discovering genetic ancestry using spectral graph theory.
Embedded
yes

Text

Successful implementation of Spectral-GEM benefits from careful choice of the SNP panel. A thoughtful choice of SNPs leads to more robust discoveries of eigenvectors that are more interpretable. In the analysis of POPRES described in Results we use less than 5% of the available SNPs, but we believe we are retaining essentially all of the available information about ancestry. In the process of chosing the SNPs for ancestry analysis we suggest a number of edits. First, we remove any SNPs with missingness rate greater than 0.2%. This edit removes artificial correlations between individuals due to imputed missing values. Second, we reduce the panel to include only tag SNPs. Including SNPs in LD leads to discoveries of axes that describe local LD structure rather than true axes of ancestry. For instance using all of the SNPs, we found d = 16, a representation that includes eight more dimensions than reported in our analysis. Through experience we have found that applying an initial screen that selects a grid of SNPs separated by 10 Kb approximates a tag SNP selection fairly well. Next