Chunk #3 — Introduction

Source: Gene-based polygenic risk scores analysis of alcohol use disorder in African Americans.
Embedded: yes

Text

The majority of variants in the genome are likely not related to a particular condition and including them in PRS calculations will reduce the performance by introducing noises. Ideally, only variants that act on disease-causing genes should be used in calculating PRS. However, most of these variants and genes remain to be discovered. If a variant located in a gene is nominally associated (e.g., P-values < 0.05) with a disease in both EA and non-EA populations and has the same direction of effect, then it is more likely to be a shared disease-causing variant and that gene is likely to be a shared disease-causing gene across populations. Therefore, using these variants to calculate PRS is expected to improve the performance by excluding many variants in the genome that are unlikely to be related to a disease, thereby increasing the signal-to-noise ratio. Moreover, since these disease-causing variants are shared among different populations, the discovery GWAS and target datasets do not have to be well-matched and the large-scale EA GWAS can be used to increase the overall discovery GWAS sample size. Based