Chunk #1 — Introduction

Source: Gene-based polygenic risk scores analysis of alcohol use disorder in African Americans.
Embedded: yes

Text

The performance of PRS relies on well-powered discovery GWAS to accurately select the disease-associated variants and estimate their effect sizes, and well-matched target datasets. For admixed populations, the sample sizes of the discovery GWAS comparable to European ancestry (EA) populations (hundreds of thousands to >1 million) require extensive and strategic data collection. Studies have shown that many disease-causing genes are shared among different populations [11–14]. Therefore, large-scale EA GWAS summary statistics can be leveraged to improve the performance of PRS in non-EA populations by increasing the overall discovery GWAS sample size. However, disease-associated variants may have different allele frequencies and effect sizes in different populations, and linkage disequilibrium (LD) patterns are also different [12, 15–18], i.e., the target datasets are not matched to the discovery GWAS. Furthermore, for admixed populations such as AA, the proportions of African ancestry range from close to 0 to almost 100% and are differently distributed across the genome, making AA an extremely heterogeneous population. Therefore, different AA target datasets may also have different LD patterns and allele frequencies, and PRS results from one study cannot