paperKB
coga / coga-kb
Processing
Help
Sign in

Chunk #27 — Materials and Methods — An Additional 250 Icelandic Samples

Source
The impact of divergence time on the nature of population structure: an example from Iceland.
Embedded
yes

Text

We randomly selected 250 samples from the 35,457 samples that were genotyped on the Illumina 300 K chip. Of these 250 samples, five overlapped the previous set of 877 samples; these were retained in the set of 250 additional samples but excluded from the set of original samples, in which only 872 samples were retained. We ran PCA on the combined set of 1,112 samples (Figure 2B) and used the 872 original samples to compute the average value of PC1 and PC2 for each region r. For each of the 250 additional samples, we computed the Euclidean distance between (PC1,PC2) for that sample and the average value of (PC1,PC2) for region r, and defined our prediction of regional ancestry as the value of r minimizing that distance. We defined true ancestry as the region in which the greatest number of ancestors five generations back was born. We compared predicted ancestry with true ancestry, both for the set of 250 samples and for a subset of 98 samples with majority ancestry from a single region. Given the low number of ancestors