paperKB
coga / coga-kb
Processing
Help
Sign in

Chunk #1 — INTRODUCTION

Source
Discovering genetic ancestry using spectral graph theory.
Embedded
yes

Text

Genetic ancestry can be estimated based on allele counts derived from individuals measured at a large number of SNPs. A dimension reduction tool known as principal component analysis (PCA [Cavalli-Sforza et al., 1994; Patterson et al., 2006; Price et al., 2006]), or principal component maps (PC maps), summarize the genetic similarity between subjects at a large numbers of SNPs using continuous axes of genetic variation. These axes are inferred from the dominant eigenvectors of a data-based similarity matrix and define a “spectral” embedding, also known as an eigenmap, of the original data. Typically a small number of ancestry dimensions are sufficient to describe the key variation. For instance, in Europe, eigenvectors displayed in two dimensions often reflect the geographical distribution of populations [Heath et al., 2008; Novembre et al., 2008]. The number of dimensions required to capture the key features in the data vary, depending on the nature of the structure. If the sample consists of k distinct subpopulations, typically k−1 axes will be required to differentiate these subpopulations. If a population has a gradient or cline, then an axis is required for this feature.