Chunk #10 — Materials and Methods — Algorithm to select a set of SNPs for population structure inference

Source: Population substructure and control selection in genome-wide association studies.
Embedded: yes

Text

To optimize the principal components analysis of population structure, we identified a set of SNPs with low background LD, i.e., r 2 LD statistic [23] less than a given threshold (e.g., 0.004) within a given physical distance d (e.g., 500 kb). Our algorithm modifies the greedy search algorithm of Carlson et al. [24], which selects the minimum number of SNPs (called tagSNPs) necessary to monitor remaining non-tagSNP above a threshold level of correlation (measured by r 2). For our purposes, the SNP selection algorithm differs in that it identifies the maximum number of mutually “independent” SNPs for the inference of population structure.