paperKB
coga / coga-kb
Help
Sign in

Chunk #12 — METHODS — Principal components analysis

Source
Genes mirror geography within Europe.
Embedded
yes

Text

To avoid artefacts due to patterns of linkage disequilibrium3, we filtered autosomal SNPs using two approaches simultaneously. First, before running PCA we used the PLINK28 software to exclude SNPs with pairwise genotypic r2 greater than 80% within sliding windows of 50 SNPs (with a 5-SNP increment between windows). Second, we took an iterative approach by running an initial PCA and removing chromosomal regions that showed evidence of reflecting regions of exceptional long-range linkage disequilibrium rather than genome-wide patterns of structure. These regions are detectable by plotting the correlation between individual PC scores and genotypes against the genome and identifying sharp, concentrated peaks in correlation (alternatively, we could have plotted the magnitude of elements of the SNP-based eigenvectors from the PCA, but here we used the correlation-based approach because much of this work was done before the release of recent versions of smartpca that provide the SNP eigenvectors). SNPs falling within a 4 megabase region of a peak were excluded from the final PCA. Initially, peaks were defined by taking the top 0.01% of SNPs correlating with a PC for each