Chunk #11 — Materials and Methods — Algorithm to select a set of SNPs for population structure inference

Source: Population substructure and control selection in genome-wide association studies.
Embedded: yes

Text

The algorithm selected a panel of population structure inference SNPs by iterating over the following three steps. First, for each SNP (called the reference SNP in this process) in the selection pool, all SNPs that are within the distance d of the reference SNP and have the r 2 LD measure with the reference SNP above the threshold were identified and grouped as a bin. Second, the bin with the smallest size is identified, with its reference SNP being added to the list of structure inference SNPs. If more than two bins have the minimal size, we randomly pick one. Third, the selection pool of SNPs is updated by removing every SNP included in the bin identified in the second step. The process is complete when no SNP is left in the selection pool.