Ancestry deconvolution and partial polygenic score can improve susceptibility predictions in recently admixed individuals.

paper Cited Public

Authors: Marnetto, Davide; Pärna, Katri; Läll, Kristi; Molinaro, Ludovica; Montinaro, Francesco; Haller, Toomas; Metspalu, Mait; Mägi, Reedik; Fischer, Krista; Pagani, Luca
Year: 2020
Journal: Nature communications
PMID: 32242022
DOI: 10.1038/s41467-020-15464-w
PMCID: PMC7118071

Fig. 1

Schematic workflow.A graphical representation of the workflow we adopted to obtain normalized PS and ancestry specific pPS. White boxes represent input data, the two key steps of ancestry deconvolution and partial PS computation have an orange background.

Fig. 2

Population-wide Polygenic Scores (PS) and ancestry specific partial PS.PS distributions for seven reference populations (pastel colors), three admixed populations (yellow) and their relative ancestry specific partial PS (red and blue). Reference population medians are represented with dashed lines. The width of the boxplots is proportional to the median size of the ancestry fraction used to compute each aspPS. Four different PS for different phenotypes are shown: (a) T2D28, (b) breast cancer30 (c) height29, (d) BMI29. Significant differences with randomly assigned ancestral components are encoded as: *: p ≤ 0.05, **: p ≤ 0.005, ***: p ≤ 10−5, (one-sided Wilcoxon signed-rank test). Sample sizes and exact P-values are reported in Supplementary Data 1. For each distribution, the box represent the interquartile range (IQR = Q3−Q1), the line across the box indicate the median, the whiskers extend to the most extreme data points within Q1−1.5IQR and Q3 + 1.5IQR, outliers are omitted. CEU: North-West Europeans from Utah; IBS: Iberians from Spain; TSI: Tuscans from Italy; CHB: Han from Beijing; YRI: Yoruba from Nigeria; LWK: Luhya from Kenya; GUMUZ: Gumuz from Ethiopia; EGYPT: Egyptians; ETHIOPIA: Amhara, Oromo, Wolayta and Ethiopian Somali from Ethiopia; ASW: African-Americans from South-West USA.

Fig. 3

pPS predictivity.We plugged in four trait prediction models pPS obtained with genomic subsets of variable sizes (the same resulting from local ancestry analysis in our admixed individuals), in a non-admixed sample set derived from EstBB. Each point represents the performance of a different subset of the genome, applied to all individuals in the population; on the horizontal axis is reported the fraction of genomic SNPs included in each subset, while the vertical axis represents its predictivity, expressed in R2 (for binary traits we used Nagelkerke R2). Red dots represent pPS not significantly improving the base model without PS (p > 0.05, likelihood ratio test). The dashed line represents the total PS predictivity. (a) Type 2 diabetes, (b) breast cancer, (c) height, (d) BMI.

Fig. 4

Population-wide PS and aspPS in UKBB admixed individuals.PS distributions for four reference populations (pastel colors), three admixed populations (yellow) and their relative ancestry specific partial PS (red, blue, green). Reference population medians are represented with dashed lines. The width of the boxplots is proportional to the median size of the ancestry fraction used to compute each aspPS. Two different PS are shown: (a) height29 and (d) BMI29. Significant differences with randomly assigned ancestral components are encoded as: *: p ≤ 0.05, **: p ≤ 0.005, ***: p ≤ 10−5 (one-sided Wilcoxon signed-rank test). Sample sizes and exact P-values are reported in Supplementary Data 1. For each distribution, the box represent the interquartile range (IQR = Q3−Q1), the line across the box indicate the median, the whiskers extend to the most extreme data points within Q1−1.5IQR and Q3 + 1.5IQR, outliers are omitted. (c) PS bias, defined as mean PS difference not explained by trait difference, is compared with FST against the reference population. All populations extracted from UKBB are represented, showing for UK EURAFR the fraction of European ancestry, UK EUR is the reference population for UKBB-based PSs, while UK EAS is the reference for BBJ-based PSs. EUR indicates european descent, EAS east asian descent, AFR african descent; combinations indicate admixed samples. FAREUR indicates Europeans far from the UKBB core.

Fig. 5

Predictivity in admixed genomes.Each plot shows the improvement in R2 when adding a PS to the base, non-genetic model. The line color depicts which PS configuration has been used: traditional total PS (PSUKBB or PSBBJ according to the Biobank of origin), partial ancestry specific PSs or combined ancestry specific PS. Dots represent the realized R2 improvement in each set without resampling, while bars represent standard deviation derived from n=5000 bootstrap replications. a Added R2 for height in UKBB samples with admixed African and European ancestry, no casPS was available. b Added R2 for height in UKBB samples with admixed East Asian and European ancestry. c Added R2 for BMI in UKBB samples with admixed African and European ancestry; no casPS was available. d Added R2 for BMI in UKBB samples with admixed East Asian and European ancestry. EUR indicates european descent, EAS east asian descent, AFR african descent; combinations indicate admixed samples.

#	Section	Preview
0	Introduction	Polygenic Scores (PSs) are computed by summing the contribution of many associated alleles across…
1	Introduction	While any human genome can be seen as the mixture of its ancestors, here we focus on individuals…
2	Introduction	PS transferability has been proved exceptionally difficult across deeply divergent…
3	Introduction	We focus on PS of four thoroughly studied traits (Type 2 Diabetes (T2D)28, height29, Body Mass Index…
4	Results — Proposed model and workflow	As introduced above, current PSs are often poorly transferable across populations. Considering PS as…
5	Results — Proposed model and workflow	Let us consider an admixed genome which descends with proportion p from population A and with…
6	Results — Proposed model and workflow	We can also define ancestry specific partial PS (aspPSA) as a proxy for the total standardized PS…
7	Results — Proposed model and workflow	We thus proceeded defining a way to compute partial PS (pPS), a statistic that estimates the total…
8	Results — Proposed model and workflow	when applied to the full genome. Furthermore, we call the pPS calculated on ancestry specific…
9	Results — Local ancestry deconvolution	We started by assessing the accuracy of ELAI24, a local ancestry inference (LAI) or deconvolution…
10	Results — Population-level aspPS distributions	The PS population distributions of the four traits considered here, calculated on the whole genomes…
11	Results — Population-level aspPS distributions	values of non-admixed European and African populations. This cannot be attributed to a casual…
12	Results — Population-level aspPS distributions	These preliminary results on admixed genomes showed promising evidences for the usage of aspPS in…
13	Results — Partial PS predictivity in uniform genomes	Before introducing differential ancestry effects in our system we tested whether a PS computed on a…
14	Results — Partial PS predictivity in uniform genomes	For all traits, we show that pPS calculated with even a small portion of available genome are…
15	Results — AspPS in admixed genomes from UK Biobank	We then moved on testing the conclusions drawn from EstBB on the UK Biobank27 (UKBB), particularly…
16	Results — AspPS in admixed genomes from UK Biobank	The ancestry deconvoluted UKBB admixed samples were then grouped according to their inferred…
17	Results — AspPS in admixed genomes from UK Biobank	Additionally, the same effect was observed in PSs derived from Biobank Japan33,34 (BBJ) and…
18	Results — AspPS predictivity in admixed genomes	We then tested the phenotype predictivity of aspPS on admixed genomes extracted from UKBB. We fitted…
19	Results — AspPS predictivity in admixed genomes	the trait-SNP associations discovered in Europeans is greater than zero and (B) the directional bias…

Citation	PMID	DOI	Status
Akiyama, M, Nat. Commun., 2019, Characterizing rare and low-frequency height-associated variants in the Japanese population	31562340	10.1038/s41467-019-12276-5	Cited
Akiyama, M, Nat. Genet., 2017, Genome-wide association study identifies 112 new loci for body mass index in the Japanese population	28892062	10.1038/ng.3951	Cited
Alexander, DH et al., Genome Res., 2009, Fast model-based estimation of ancestry in unrelated individuals	19648217	10.1101/gr.094052.109	Cited
Buniello, A, Nucleic Acids Res., 2019, The NHGRI-EBI GWAS Catalog of published genome-wide association studies, targeted arrays and summary statistics 2019	30445434	10.1093/nar/gky1120	Cited
Bycroft, C, Nature, 2018, The UK Biobank resource with deep phenotyping and genomic data	30305743	10.1038/s41586-018-0579-z	Cited
Chang, CC, GigaScience, 2015, Second-generation PLINK: rising to the challenge of larger and richer datasets	25722852	10.1186/s13742-015-0047-8	Cited
Choi, SW et al., GigaScience, 2019, PRSice-2: Polygenic Risk Score software for biobank-scale data	31307061	10.1093/gigascience/giz082	Cited
Churchhouse, C. & Neale, B. M. UK Biobank, Neale Lab. http://www.nealelab.is/uk-biobank/ (2019).	—	—	—
Danecek, P, Bioinformatics, 2011, The variant call format and VCFtools	21653522	10.1093/bioinformatics/btr330	Cited
De La Vega, FM et al., Genome Med., 2018, Polygenic risk scores: a biased prediction?	30591078	10.1186/s13073-018-0610-x	Cited
Guan, Y, Genetics, 2014, Detecting structure of haplotypes and local ancestry	24388880	10.1534/genetics.113.160697	Cited
Hall, M et al., Popul. Dev. Rev., 2016, Trajectories of ethnoracial diversity in american communities, 1980-2010	29398737	10.1111/j.1728-4457.2016.00125.x	Cited
Haworth, S, Nat. Commun., 2019, Apparent latent structure within the UK Biobank sample has implications for epidemiological analysis	30659178	10.1038/s41467-018-08219-1	Cited
Kerminen, S, Am. J. Hum. Genet., 2019, Geographic variation and bias in the polygenic scores of complex diseases and traits in Finland	31155286	10.1016/j.ajhg.2019.05.001	Cited
Kettunen, J, Nat. Commun., 2016, Genome-wide study for circulating metabolites identifies 62 loci and reveals novel systemic effects of LPA	27005778	10.1038/ncomms11122	Cited
Kim, MS et al., Genome Biol., 2018, Genetic disease risks can be misestimated across global populations	30424772	10.1186/s13059-018-1561-7	Cited
Lawson, DJ et al., PLoS Genet., 2012, Inference of population structure using dense haplotype data	22291602	10.1371/journal.pgen.1002453	Cited
Leitsalu, L, Int. J. Epidemiol., 2015, Cohort profile: Estonian biobank of the Estonian genome center, university of Tartu	24518929	10.1093/ije/dyt268	Cited
Läll, K et al., Genet. Med., 2017, Personalized risk prediction for type 2 diabetes: the potential of genetic risk scores	27513194	10.1038/gim.2016.103	Cited
Mahajan, A, Nat. Genet., 2014, Genome-wide trans-ancestry meta-analysis provides insight into the genetic architecture of type 2 diabetes susceptibility	24509480	10.1038/ng.2897	Cited
Manrai, AK, New Engl. J. Med., 2016, Genetic misdiagnoses and the potential for health disparities	27532831	10.1056/NEJMsa1507092	Cited
Martin, AR, Am. J. Hum. Genet., 2017, Human demographic history impacts genetic risk prediction across diverse populations	28366442	10.1016/j.ajhg.2017.03.004	Cited
Martin, AR, Nat. Genet., 2019, Clinical use of current polygenic risk scores may exacerbate health disparities	30926966	10.1038/s41588-019-0379-x	Cited
Michailidou, K, Nature, 2017, Association analysis identifies 65 new breast cancer risk loci	29059683	10.1038/nature24284	Cited
Moorjani, P, PLoS Genet., 2011, The history of african gene flow into Southern Europeans, Levantines, and Jews	21533020	10.1371/journal.pgen.1001373	Cited
Márquez-Luna, C et al., Genet. Epidemiol., 2017, South Asian Type 2 Diabetes (SAT2D) Consortium, SIGMA Type 2 Diabetes Consortium and Price, A. L. Multiethnic polygenic risk scores improve risk prediction in diverse populations	29110330	10.1002/gepi.22083	Cited
Pagani, L, Am. J. Hum. Genet., 2015, Tracing the route of modern humans out of Africa by using 225 human genome sequences from Ethiopians and Egyptians	26027499	10.1016/j.ajhg.2015.04.019	Cited
Popejoy, AB et al., Nature, 2016, Genomics is failing on diversity	27734877	10.1038/538161a	Cited
Reisberg, S et al., PloS ONE, 2017, Comparing distributions of polygenic risk scores of type 2 diabetes and coronary heart disease within different populations	28678847	10.1371/journal.pone.0179238	Cited
Scutari, M et al., PLoS Genet., 2016, Using genetic distance to infer the accuracy of genomic prediction	27589268	10.1371/journal.pgen.1006288	Cited
Skotte, L et al., Genet. Epidemiol., 2019, Ancestry-specific association mapping in admixed populations	30883944	10.1002/gepi.22200	Cited
Sohail, M, eLife, 2019, Polygenic adaptation on height is overestimated due to uncorrected stratification in genome-wide association studies	30895926	10.7554/eLife.39702	Cited
The 1000 Genomes Project Consortium., Nature, 2015, A global reference for human genetic variation	26432245	10.1038/nature15393	Cited
Visscher, PM, Am. J. Hum. Genet., 2017, 10 years of GWAS discovery: biology, function, and translation	28686856	10.1016/j.ajhg.2017.06.005	Cited
Weiss, KM et al., Genome Res., 2009, Non-Darwinian estimation: my ancestors, my genes’ ancestors	19411595	10.1101/gr.076539.108	Cited
Wray, NR et al., Genome Res., 2007, Prediction of individual genetic risk to disease from genome-wide association studies	17785532	10.1101/gr.6665407	Cited
Wray, NR, J. Child Psychol. Psyc., 2014, Research review: polygenic methods and their application to psychiatric traits	25132410	10.1111/jcpp.12295	Cited

In this knowledge base

Title	Year	PMID
A saturated map of common genetic variants associated with human height.	2022	36224396
Gene-based polygenic risk scores analysis of alcohol use disorder in African Americans.	2022	35790736
Inclusion of variants discovered from diverse populations improves polygenic risk score transferability.	2021	33564748

External

Title	Authors	Journal	Year	Link
Clinical translation of polygenic scores for prostate cancer screening.	Ratner D et al.	—	2026	→
Clinical use of polygenic risk scores: current status, barriers and future directions.	Kullo IJ	—	2026	→
Admixed and single-continental genome segments of the same ancestry have distinct linkage disequilibrium patterns.	Lee H et al.	—	2025	→
CADET: Enhanced transcriptome-wide association analyses in admixed samples using eQTL summary data.	Head ST et al.	—	2025	→
Characterizing features affecting local ancestry inference performance in admixed populations.	Honorato-Mauer J et al.	—	2025	→
Data simulation to optimize frameworks for genome-wide association studies in diverse populations.	Mugo JW et al.	—	2025	→
Decreased Clearance of Low-Density Lipoprotein Cholesterol is Causally Associated With Increased Mortality of Septic Shock.	Takahashi N et al.	—	2025	→
Fine-scale population structure and widespread conservation of genetic effect sizes between human groups across traits.	Hu S et al.	—	2025	→
Hidden structure in polygenic scores and the challenge of disentangling ancestry interactions in admixed populations.	Aw AJ et al.	—	2025	→
Improved allele frequencies in gnomAD through local ancestry inference.	Kore P et al.	—	2025	→
Incorporating multiracial and multiethnic experiences into genetic counseling practice and research: A necessary opportunity.	Lowe C et al.	—	2025	→
Leveraging global genetics resources to enhance polygenic prediction across ancestrally diverse populations.	Pain O	—	2025	→
Leveraging local ancestry and cross-ancestry genetic architecture to improve genetic prediction of complex traits in admixed populations.	Zhou G et al.	—	2025	→
Multiomics in atherosclerotic cardiovascular disease.	Nordestgaard LT et al.	—	2025	→
Opportunities and challenges of local ancestry in genetic association analyses.	Sun Q et al.	—	2025	→
Psychiatric genetics in the diverse landscape of Latin American populations.	Bruxel EM et al.	—	2025	→
STREAM-PRS: a multi-tool pipeline for streamlining polygenic risk score computation.	Becelaere S et al.	—	2025	→
The accuracy of polygenic score models for BMI and Type II diabetes in the Native Hawaiian population.	Lo YC et al.	—	2025	→
The Estonian Biobank's journey from biobanking to personalized medicine.	Milani L et al.	—	2025	→
Tracing human genetic histories and natural selection with precise local ancestry inference.	Lerga-Jaso J et al.	—	2025	→
Admix-kit: an integrated toolkit and pipeline for genetic analyses of admixed populations.	Hou K et al.	—	2024	→
Assessing the Risk Stratification of Breast Cancer Polygenic Risk Scores in a Brazilian Cohort.	Barreiro RAS et al.	—	2024	→
Complex trait susceptibilities and population diversity in a sample of 4,145 Russians.	Usoltsev D et al.	—	2024	→
Genetic and molecular architecture of complex traits.	Lappalainen T et al.	—	2024	→
Improving polygenic risk prediction in admixed populations by explicitly modeling ancestral-differential effects via GAUDI.	Sun Q et al.	—	2024	→
Methodologies underpinning polygenic risk scores estimation: a comprehensive overview.	Ndong Sima CAA et al.	—	2024	→
Polygenic risk for suicide attempt is associated with lifetime suicide attempt in US soldiers independent of parental risk.	Stein MB et al.	—	2024	→
Principles and methods for transferring polygenic risk scores across global populations.	Kachuri L et al.	—	2024	→
Promoting equity in polygenic risk assessment through global collaboration.	Kullo IJ	—	2024	→
Recent advances in polygenic scores: translation, equitability, methods and FAIR tools.	Xiang R et al.	—	2024	→
shaPRS: Leveraging shared genetic effects across traits or ancestries improves accuracy of polygenic scores.	Kelemen M et al.	—	2024	→
The PRIMED Consortium: Reducing disparities in polygenic risk assessment.	Kullo IJ et al.	—	2024	→
Causal effects on complex traits are similar for common variants across segments of different continental ancestries within admixed individuals.	Hou K et al.	—	2023	→
FairPRS: adjusting for admixed populations in polygenic risk scores using invariant risk minimization.	Machado Reyes D et al.	—	2023	→
Global Biobank analyses provide lessons for developing polygenic risk scores across diverse cohorts.	Wang Y et al.	—	2023	→
Implementing Reporting Standards for Polygenic Risk Scores for Atherosclerotic Cardiovascular Disease.	Smith JL et al.	—	2023	→
Improving genetic risk prediction across diverse population by disentangling ancestry representations.	Gyawali PK et al.	—	2023	→
Local Ancestry Inference for Complex Population Histories	Pearson A et al.	—	2023	—
Power of inclusion: Enhancing polygenic prediction with admixed individuals.	Tanigawa Y et al.	—	2023	→
Strategies for the Genomic Analysis of Admixed Populations.	Tan T et al.	—	2023	→
A Principal Component Informed Approach to Address Polygenic Risk Score Transferability Across European Cohorts.	Pärna K et al.	—	2022	→
A saturated map of common genetic variants associated with human height.	Yengo L et al.	—	2022	→
Challenges and Opportunities for Developing More Generalizable Polygenic Risk Scores.	Wang Y et al.	—	2022	→
Clinical utility of polygenic risk scores for coronary artery disease.	Klarin D et al.	—	2022	→
Development of a clinical polygenic risk score assay and reporting workflow.	Hao L et al.	—	2022	→
Enrichment analyses identify shared associations for 25 quantitative traits in over 600,000 individuals from seven diverse ancestries.	Smith SP et al.	—	2022	→
Gene-based polygenic risk scores analysis of alcohol use disorder in African Americans.	Lai D et al.	—	2022	→
Genetics and epigenetics of self-injurious thoughts and behaviors: Systematic review of the suicide literature and methodological considerations.	Mirza S et al.	—	2022	→
Genome-wide risk prediction of common diseases across ancestries in one million people.	Mars N et al.	—	2022	→
Including diverse and admixed populations in genetic epidemiology research.	Caliebe A et al.	—	2022	→
Incorporating family history of disease improves polygenic risk scores in diverse populations.	Hujoel MLA et al.	—	2022	→
Leveraging fine-mapping and multipopulation training data to improve cross-population polygenic risk scores.	Weissbrod O et al.	—	2022	→
Long-Lived Individuals Show a Lower Burden of Variants Predisposing to Age-Related Diseases and a Higher Polygenic Longevity Score.	Torres GG et al.	—	2022	→
Polygenic Risk Score in African populations: progress and challenges.	Adam Y et al.	—	2022	→
Polygenic risk scores for CARDINAL study.	Adebamowo CA et al.	—	2022	→
Polygenic Risk Scores for Cardiovascular Disease: A Scientific Statement From the American Heart Association.	O'Sullivan JW et al.	—	2022	→
SALAI-Net: species-agnostic local ancestry inference network.	Oriol Sabat B et al.	—	2022	→
Use of Polygenic Risk Scores for Coronary Heart Disease in Ancestrally Diverse Populations.	Dikilitas O et al.	—	2022	→
Admixture Has Shaped Romani Genetic Diversity in Clinically Relevant Variants.	Font-Porterias N et al.	—	2021	→
Allele frequency differentiation at height-associated SNPs among continental human populations.	Chen M et al.	—	2021	→
Changes in the fine-scale genetic structure of Finland through the 20th century.	Kerminen S et al.	—	2021	→
Detecting Genetic Ancestry and Adaptation in the Taiwanese Han People.	Lo YH et al.	—	2021	→
Genetic propensity for risky behavior and depression and risk of lifetime suicide attempt among urban African Americans in adolescence and young adulthood.	Rabinowitz JA et al.	—	2021	→
Inclusion of variants discovered from diverse populations improves polygenic risk score transferability.	Cavazos TB et al.	—	2021	→
Multi-Omic Approaches to Identify Genetic Factors in Metabolic Syndrome.	Clark KC et al.	—	2021	→
Neuropsychiatric Genetics of Psychosis in the Mexican Population: A Genome-Wide Association Study Protocol for Schizophrenia, Schizoaffective, and Bipolar Disorder Patients and Controls.	Camarena B et al.	—	2021	→
New Polygenic Risk Score to Predict High Myopia in Singapore Chinese Children.	Lanca C et al.	—	2021	→
Populations, Traits, and Their Spatial Structure in Humans.	Sohail M et al.	—	2021	→
Responsible use of polygenic risk scores in the clinic: potential benefits, risks and gaps.	Polygenic Risk Score Task Force of the International Common Disease Alliance	—	2021	→
Statistical genetics and polygenic risk score for precision medicine.	Konuma T et al.	—	2021	→
Polygenic Scores for Height in Admixed Populations.	Bitarello BD et al.	—	2020	→
Validation of a Genome-Wide Polygenic Score for Coronary Artery Disease in South Asians.	Wang M et al.	—	2020	→