Chunk #10 — Methods to assess genetic risk — Concepts and methods to interpret ranges of genetic prediction accuracy

Source: Predicting Polygenic Risk of Psychiatric Disorders.
Embedded: yes

Text

The key steps of computing PRS are determining which variants to include2 and their weights (Table S1). One of the most widely-adopted methods is genetic risk profiling in plink, a commonly used computational toolkit. In this approach, semi-independent SNP effects are multiplied by the number of risk alleles, starting with the most to least significant associations; these effects are then summed across the genome. An important consideration is the choice of the optimal p-value threshold, analogous to a tuning parameter that balances a signal and noise tradeoff. This tradeoff arises because more significant p-value thresholds have higher proportions of causal variants, but the total number of variants is smaller than with more permissive thresholds. There is not simply one optimal p-value threshold for a discovery GWAS dataset; rather, it varies based on SNP overlap, genetic divergence3, genetic correlation, and other differences between the discovery and target data. The standard PRS approach is to calculate several scores from SNPs meeting various p-value thresholds on a log scale ranging from genome-wide significant (p < 5e-8) to all independent SNPs (p < 1),