Inclusion of variants discovered from diverse populations improves polygenic risk score transferability.
- Authors
- Cavazos, Taylor B; Witte, John S
- Year
- 2021
- Journal
- HGG advances
- PMID
- 33564748
- DOI
- 10.1016/j.xhgg.2020.100017
- PMCID
- PMC7869832
The majority of polygenic risk scores (PRSs) have been developed and optimized in individuals of European ancestry and may have limited generalizability across other ancestral populations. Understanding aspects of PRSs that contribute to this issue and determining solutions is complicated by disease-specific genetic architecture and limited knowledge of sharing of causal variants and effect sizes across populations. Motivated by these challenges, we undertook a simulation study to assess the relationship between ancestry and the potential bias in PRSs developed in European ancestry populations. Our simulations show that the magnitude of this bias increases with increasing divergence from European ancestry, and this is attributed to population differences in linkage disequilibrium and allele frequencies of European-discovered variants, likely as a result of genetic drift. Importantly, we find that including into the PRS variants discovered in African ancestry individuals has the potential to achieve unbiased estimates of genetic risk across global populations and admixed individuals. We confirm our simulation findings in an analysis of hemoglobin A1c (HbA1c), asthma, and prostate cancer in the UK Biobank. Given the demonstrated improvement in PRS prediction accuracy, recruiting larger diverse cohorts will be crucial-and potentially even necessary-for enabling accurate and equitable genetic risk prediction across populations.
Accuracy of European-derived PRSs by proportion of total ancestryAccuracy of PRSs, with variants and weights from a European GWAS, decreases linearly with increasing proportion of African ancestry. Variants and weights were extracted from a GWAS of 10,000 European cases and 10,000 European controls. PRS accuracy was computed as the Pearsonβs correlation between the true genetic risk and GWAS estimated risk score across 50 simulations in independent test populations of 5,000 Europeans, 5,000 Africans, and 5,000 admixed individuals. Admixed individuals were grouped based on their proportion of genome-wide European ancestry. Simulations assume 1,000 causal variants and a heritability of 0.5 to compute the true genetic risk. A p value of 0.01 and LD r2 cutoff of 0.2 was used to select variants for the estimated risk score.
PRS construction approaches and performance in admixed individualsUsing significant variants from an African ancestry GWAS with population-specific weights results in less disparity in PRS accuracy across populations. PRSs were constructed using variants and weights selected from either a European or African population (10,000 cases, 10,000 controls each) or a fixed-effects meta-analysis of both. An additional local ancestry-specific method was used for PRS weighting. Performance, measured as the Pearsonβs correlation between the true and GWAS estimated risk score, is shown across 50 simulations. Simulations assume 1,000 causal variants and a heritability of 0.5 to compute the true genetic risk. A p value of 0.01 and LD r2 cutoff of 0.2 was used to select variants for the estimated risk scores.
Impact of African sample size on PRS accuracy and generalizationPRS accuracy in diverse populations can be improved by including data from an African ancestry GWAS with smaller sample sizes than in a European GWAS. The number of African samples used in the GWAS and subsequent PRS construction was decreased to reflect availability of diverse samples in real data. Analysis was conducted assuming 1%, 5%, 10%, 50%, and 100% (matched size of European dataset) of the total African ancestry cases. Average accuracy and the 95% CI were reported across the 50 simulations for different variant selection and weighting approaches. Simulations assume 1,000 causal variants and a heritability of 0.5 to compute the true genetic risk. A p value of 0.01 and LD r2 cutoff of 0.2 was used to select variants for the estimated risk score. A linear mixture of single-population PRSs (Ξ±1EUR+Ξ±2AFR), with variants and weights selected from that population, was also tested in the admixed population. The mixture coefficients (Ξ±1 and Ξ±2) were estimated in an independent African ancestry testing population.
Allele frequency distribution of GWAS selected variants and LD tagging of causal variantsGWAS significant variants are more common in the study population from which they were discovered; however, African Ancestry GWAS variants may result in better LD tagging across populations. Variants were selected from a European or African ancestry GWAS or a fixed-effects meta-analysis of both populations.(A) GWAS variants were binned by their MAF estimated from the European, African, and admixed populations. The error bar represents the 95% CI across simulations.(B) LD scores were calculated for every causal variant by adding up the LD r2 for each GWAS tag variant within Β±1,000 kb of the causal variant. LD scores calculated in a Europeans and Africans were compared by Pearsonβs correlation. The results were summarized across simulations as the average and 95% CI.(C) Raw LD scores for each causal variant (m = 1,000) calculated in a European or African population for one simulation. Each panel shows the approach used for variant selection. Causal variants directly discovered through the GWAS are colored in gray.
No entities extracted from this document yet.
No uploaded files.
In this knowledge base
| Title | Year | PMID |
|---|---|---|
| Gene-based polygenic risk scores analysis of alcohol use disorder in African Americans. | 2022 | 35790736 |
External
| Title | Authors | Journal | Year | Link |
|---|---|---|---|---|
| Polygenic Risk Score for Cancer in African Population: A Systematic Review. | Rashed WM et al. | β | 2026 | β |
| Polygenic risk scores: Navigating the future of precision medicine through economic, ethical, and scientific advancements. | Nguyen HHK et al. | β | 2026 | β |
| A review of methods and software for polygenic risk score analysis. | Benoumhani S et al. | β | 2025 | β |
| Cross-species analysis of genetic architecture and polygenic risk scores for non-contact ACL rupture in dogs and humans. | Momen M et al. | β | 2025 | β |
| Genetic risk for trait aggression and alcohol use predict unique facets of alcohol-related aggression. | Spychala KM et al. | β | 2025 | β |
| Genome-wide association study identifies common variants associated with breast cancer in South African Black women. | Hayat M et al. | β | 2025 | β |
| Maternal parity modifies the association of birthweight polygenic score with fetal growth. | Wijesiriwardhana P et al. | β | 2025 | β |
| Methods for multiancestry genome-wide association study meta-analysis. | Yap CF et al. | β | 2025 | β |
| Obstructive sleep apnea mediates genetic risk of Diabetes Mellitus in Hispanic and Latino communities. | Hrytsenko Y et al. | β | 2025 | β |
| Polygenic scores for obstructive sleep apnoea reveal pathways contributing to cardiovascular disease. | Kurniansyah N et al. | β | 2025 | β |
| Provider perceptions and insights on polygenic risk scores for colorectal cancer: A qualitative study. | Esmundo S et al. | β | 2025 | β |
| Recommendations for responsible use of population descriptors in polygenic risk score development. | Smith JL et al. | β | 2025 | β |
| An ensemble penalized regression method for multi-ancestry polygenic risk prediction. | Zhang J et al. | β | 2024 | β |
| Applications of genome sequencing as a single platform for clinical constitutional genetic testing. | Yang Y et al. | β | 2024 | β |
| Assessing the predictive efficacy of European-based systolic blood pressure polygenic risk scores in diverse Brazilian cohorts. | Teixeira SK et al. | β | 2024 | β |
| Clinical characteristics of probands with obsessive-compulsive disorder from simplex and multiplex families. | Lima MO et al. | β | 2024 | β |
| Evaluating the cost-effectiveness of polygenic risk score-stratified screening for abdominal aortic aneurysm. | Kelemen M et al. | β | 2024 | β |
| Exploring the role of underrepresented populations in polygenic risk scores for neurodegenerative disease risk prediction. | Step K et al. | β | 2024 | β |
| Mapping the relative accuracy of cross-ancestry prediction. | Lupi AS et al. | β | 2024 | β |
| Methodologies underpinning polygenic risk scores estimation: a comprehensive overview. | Ndong Sima CAA et al. | β | 2024 | β |
| Multi-Ancestry Polygenic Risk Score for Coronary Heart Disease Based on an Ancestrally Diverse Genome-Wide Association Study and Population-Specific Optimization. | Smith JL et al. | β | 2024 | β |
| Polygenic prediction for underrepresented populations through transfer learning by utilizing genetic similarity shared with European populations. | Zhu Y et al. | β | 2024 | β |
| Polygenic Scoring for Detection of Ascending Thoracic Aortic Dilation. | DePaolo J et al. | β | 2024 | β |
| Population Heterogeneity and Selection of Coronary Artery Disease Polygenic Scores. | Debernardi C et al. | β | 2024 | β |
| Principles and methods for transferring polygenic risk scores across global populations. | Kachuri L et al. | β | 2024 | β |
| Return of polygenic risk scores in research: Stakeholders' views on the eMERGE-IV study. | Sabatello M et al. | β | 2024 | β |
| shaPRS: Leveraging shared genetic effects across traits or ancestries improves accuracy of polygenic scores. | Kelemen M et al. | β | 2024 | β |
| The acceptability and clinical impact of using polygenic scores for risk-estimation of common cancers in primary care: a systematic review. | Dannhauser FC et al. | β | 2024 | β |
| The Geneticization of Education and Its Bioethical Implications. | Matthews LJ | β | 2024 | β |
| Towards a Global View of Parkinson's Disease Genetics. | Khani M et al. | β | 2024 | β |
| Utility of a Systolic Blood Pressure Polygenic Risk Score With Chlorthalidone Response. | Armstrong ND et al. | β | 2024 | β |
| Variation in the basal immune state and implications for disease. | Souquette A et al. | β | 2024 | β |
| Whole genome sequencing in clinical practice. | Bagger FO et al. | β | 2024 | β |
| Addressing the Challenge of Biomedical Data Inequality: An Artificial Intelligence Perspective. | Gao Y et al. | β | 2023 | β |
| Associations between polygenic risk score and covid-19 susceptibility and severity across ethnic groups: UK Biobank analysis. | Farooqi R et al. | β | 2023 | β |
| Biobank-scale methods and projections for sparse polygenic prediction from machine learning. | Raben TG et al. | β | 2023 | β |
| Bridging the diversity gap: Analytical and study design considerations for improving the accuracy of trans-ancestry genetic prediction. | Bocher O et al. | β | 2023 | β |
| Combating hypertension beyond genome-wide association studies: Microbiome and artificial intelligence as opportunities for precision medicine. | Aryal S et al. | β | 2023 | β |
| Early prediction of prostate cancer risk in younger men using polygenic risk scores and electronic health records. | Varma A et al. | β | 2023 | β |
| Ethical, legal, and social implications of genetic risk prediction for multifactorial disease: a narrative review identifying concerns about interpretation and use of polygenic scores. | Chapman CR | β | 2023 | β |
| Founder population-specific weights yield improvements in performance of polygenic risk scores for Alzheimer disease in the Midwestern Amish. | Osterman MD et al. | β | 2023 | β |
| Global Biobank analyses provide lessons for developing polygenic risk scores across diverse cohorts. | Wang Y et al. | β | 2023 | β |
| Improved genetic prediction of the risk of knee osteoarthritis using the risk factor-based polygenic score. | Morita Y et al. | β | 2023 | β |
| Optimal strategies for learning multi-ancestry polygenic scores vary across traits. | Lehmann B et al. | β | 2023 | β |
| Polygenic scores in cancer. | Yang X et al. | β | 2023 | β |
| Power of inclusion: Enhancing polygenic prediction with admixed individuals. | Tanigawa Y et al. | β | 2023 | β |
| Revealing polygenic pleiotropy using genetic risk scores for asthma. | Dapas M et al. | β | 2023 | β |
| Strategies for the Genomic Analysis of Admixed Populations. | Tan T et al. | β | 2023 | β |
| The cancer-risk variant frequency among Polish population reported by the first national whole-genome sequencing study. | Mroczek M et al. | β | 2023 | β |
| Trans-ancestry polygenic models for the prediction of LDL blood levels: an analysis of the United Kingdom Biobank and Taiwan Biobank. | Hassanin E et al. | β | 2023 | β |
| A multi-ethnic polygenic risk score is associated with hypertension prevalence and progression throughout adulthood. | Kurniansyah N et al. | β | 2022 | β |
| Ancestry-Matched and Cross-Ancestry Genetic Risk Scores of Type 2 Diabetes in Pregnant Women and Fetal Growth: A Study in an Ancestrally Diverse Cohort. | Ouidir M et al. | β | 2022 | β |
| Challenges and Opportunities for Developing More Generalizable Polygenic Risk Scores. | Wang Y et al. | β | 2022 | β |
| Evaluating the Potential of Polygenic Risk Score to Improve Colorectal Cancer Screening. | Arnau-Collell C et al. | β | 2022 | β |
| From diagnostic testing to precision medicine: the evolving role of genomics in cardiac channelopathies and cardiomyopathies in children. | Bidzimou MK et al. | β | 2022 | β |
| Gene-based polygenic risk scores analysis of alcohol use disorder in African Americans. | Lai D et al. | β | 2022 | β |
| Glaucoma Genetic Risk Scores in the Million Veteran Program. | Waksmunski AR et al. | β | 2022 | β |
| Human genetic admixture through the lens of population genomics. | Gopalan S et al. | β | 2022 | β |
| Including diverse and admixed populations in genetic epidemiology research. | Caliebe A et al. | β | 2022 | β |
| Leveraging fine-mapping and multipopulation training data to improve cross-population polygenic risk scores. | Weissbrod O et al. | β | 2022 | β |
| Meta-analysis of sub-Saharan African studies provides insights into genetic architecture of lipid traits. | Choudhury A et al. | β | 2022 | β |
| Non-linear machine learning models incorporating SNPs and PRS improve polygenic prediction in diverse human populations. | Elgart M et al. | β | 2022 | β |
| Polygenic risk, population structure and ongoing difficulties with race in human genetics. | Kaplan JM et al. | β | 2022 | β |
| Polygenic Risk Score in African populations: progress and challenges. | Adam Y et al. | β | 2022 | β |
| Polygenic risk scores of endo-phenotypes identify the effect of genetic background in congenital heart disease. | Spendlove SJ et al. | β | 2022 | β |
| Polygenic transcriptome risk scores (PTRS) can improve portability of polygenic risk scores across ancestries. | Liang Y et al. | β | 2022 | β |
| Population differentiation of polygenic score predictions under stabilizing selection. | Yair S et al. | β | 2022 | β |
| Portability of 245 polygenic scores when derived from the UK Biobank and applied to 9 ancestry groups from the same cohort. | PrivΓ© F et al. | β | 2022 | β |
| Prediction of evolutionary constraint by genomic annotations improves functional prioritization of genomic variants in maize. | Ramstein GP et al. | β | 2022 | β |
| Transferability of Alzheimer Disease Polygenic Risk Score Across Populations and Its Association With Alzheimer Disease-Related Phenotypes. | Jung SH et al. | β | 2022 | β |
| Transferability of genetic risk scores in African populations. | Kamiza AB et al. | β | 2022 | β |
| Admixed Populations Improve Power for Variant Discovery and Portability in Genome-Wide Association Studies. | Lin M et al. | β | 2021 | β |
| A polygenic risk score for asthma in a large racially diverse population. | Sordillo JE et al. | β | 2021 | β |
| False discovery rate control in genome-wide association studies with population structure. | Sesia M et al. | β | 2021 | β |
| Genetic Ancestry Inference and Its Application for the Genetic Mapping of Human Diseases. | Suarez-Pajes E et al. | β | 2021 | β |
| Multi-omics approach to precision medicine for immune-mediated diseases. | Ota M et al. | β | 2021 | β |
| Multiple-ancestry genome-wide association study identifies 27 loci associated with measures of hemolysis following blood storage. | Page GP et al. | β | 2021 | β |
| Prediction of evolutionary constraint by genomic annotations improves prioritization of causal variants in maize | Ramstein GP et al. | β | 2021 | β |
| Statistical models and computational tools for predicting complex traits and diseases. | Chung W | β | 2021 | β |
| The double helix at school: Behavioral genetics, disability, and precision education. | Sabatello M et al. | β | 2021 | β |
| The limits of personalization in precision medicine: Polygenic risk scores and racial categorization in a precision breast cancer screening trial. | James JE et al. | β | 2021 | β |
| The power of genetic diversity in genome-wide association studies of lipids. | Graham SE et al. | β | 2021 | β |