paperKB
coga / coga-kb
Help
Sign in

Chunk #13 — METHODS — Principal components analysis

Source
Genes mirror geography within Europe.
Embedded
yes

Text

of smartpca that provide the SNP eigenvectors). SNPs falling within a 4 megabase region of a peak were excluded from the final PCA. Initially, peaks were defined by taking the top 0.01% of SNPs correlating with a PC for each of the top 6 PCs of the preliminary analysis. In this initial analysis PCs 1 and 2 did not appear to be artefacts of long-range linkage disequilibrium, but we still removed regions around the top PC-correlated SNPs. This approach is conservative (in the sense that we potentially remove more SNPs than necessary and hence might hinder ourselves from detecting subtle patterns). The procedure removed SNPs in regions such as the lactase region (2q21), the MHC region and the inversion regions 8p23 and 17q21.31, amongst others. The final number of SNPs used for PCA was 197,146 SNPs. The patterns of structure observed in PCs 1 and 2 were robust to further removal of chromosomal regions correlated with the PCs, suggesting the observed patterns are representative of genome-wide differentiation.