Chunk #15 — Methods — Genetic based population assignment

Source: The utility of empirically assigning ancestry groups in cross-population genetic studies of addiction.
Embedded: yes

Text

Using all 10 ancestry PCs, we began by calculating the median and variance for each 1KGP population and then calculating the Mahalanobis distance(30) for each 1KGP sample for all 26 populations (Figure 1a). We chose to apply Mahalanobis distance, a common approach for detecting outliers, to assign the best population match in multivariate space as it accounts for both mean distance and group variances. Reference population outliers (> 4 SD from population median, n = 61) were then removed (Figure 1b) and the procedure was repeated for all 1KGP samples. Every S4S sample was then assigned to the 1KGP population with the minimum Mahalanobis distance. The S4S samples were then collapsed into their respective super-population assignment (Figure 1c).