Chunk #46 — METHODS — UKBB, BBJ, PAGE and TWB analysis. — Discovery data:

Source: Improving polygenic prediction in ancestrally diverse populations.
Embedded: yes

Text

strand ambiguous; (ii) located on sex chromosomes or in long-range LD regions (chr6: 25–35Mb; chr8: 7–13Mb); (iii) call rate <0.98; and (iv) MAF <0.05. We performed LD pruning on the remaining variants in 1KG using PLINK45 (--indep-pairwise 100 50 0.2), yielding 149,501 largely independent, high-quality common variants. We then conducted principal component analysis using these LD-pruned SNPs in 1KG samples, and projected SNP loadings onto UKBB samples with the scale appropriately adjusted. Using 1KG as the reference, we trained a random forest model to predict the 5 super-population labels (AFR, AMR, EAS, EUR, SAS) using the top 6 PCs, and applied the trained random forest classifier to UKBB samples to predict the genetic ancestry of each UKBB participant. We retained UKBB samples that can be assigned to one of the super-populations with a predicted probability >90%. For each population in UKBB, we selected a set of unrelated individuals and performed sample-level quality control (QC) by removing individuals meeting one of the following criteria: (i) mismatch between self-reported and genetically inferred sex; (ii) missingness or heterozygosity outliers; and (iii) sex chromosome aneuploidy. For the validation and testing of PRS in the EUR population, we used non-British EUR samples that are unrelated