Chunk #42 — Methods — Samples and quality controls

Source: Theoretical and empirical quantification of the accuracy of polygenic scores in ancestry divergent populations.
Embedded: yes

Text

= 504) and African ancestry (AFR, N = 504). We excluded African Caribbeans in Barbados (ACB) and Americans of African Ancestry in SW USA (ASW) populations from AFR and all individuals of American ancestry (AMR) considering their complex admixture patterns. We then assigned each of those genotyped participants of UKB to the closest ancestry based on the first three PCs, resulting in 463,795 EUR, 11,906 SAS, 2486 EAS and 9184 AFR. To remove cryptic relatedness in the UKB, we used the GCTA software to calculate the genomic relationship matrix (GRM)46 based on genotyped SNPs in each of the aforementioned populations. With one of each pair of individuals with estimated relatedness larger than 0.05 being removed, a subset consisting of unrelated individuals was generated in each ancestry. For the European ancestry, we only extracted those self-reported British and Irish participants. After randomly sampling 10,000 individuals from British subset, we created the discovery dataset using the remaining 313,284 individuals. As for the target populations, we used an independent dataset of ~39,000 UKB individuals. Those individuals included the 10,000 randomly sampled participants who identified themselves as British, 9979 participants of EUR who identified themselves as Irish, the 9448 participants of SAS, the 2257