Discovering genetic ancestry using spectral graph theory.
- Authors
- Lee, Ann B; Luca, Diana; Klei, Lambertus; Devlin, Bernie; Roeder, Kathryn
- Year
- 2010
- Journal
- Genetic epidemiology
- PMID
- 19455578
- DOI
- 10.1002/gepi.20434
- PMCID
- PMC4610359
As one approach to uncovering the genetic underpinnings of complex disease, individuals are measured at a large number of genetic variants (usually SNPs) across the genome and these SNP genotypes are assessed for association with disease status. We propose a new statistical method called Spectral-GEM for the analysis of genome-wide association studies; the goal of Spectral-GEM is to quantify the ancestry of the sample from such genotypic data. Ignoring structure due to differential ancestry can lead to an excess of spurious findings and reduce power. Ancestry is commonly estimated using the eigenvectors derived from principal component analysis (PCA). To develop an alternative to PCA we draw on connections between multidimensional scaling and spectral graph theory. Our approach, based on a spectral embedding derived from the normalized Laplacian of a graph, can produce more meaningful delineation of ancestry than by using PCA. Often the results from Spectral-GEM are straightforward to interpret and therefore useful in association analysis. We illustrate the new algorithm with an analysis of the POPRES data [Nelson et al., 2008].
Principal components from PCA for Scenario 1. Subjects are self-identified as UK (black), Italian (red), and non-European (blue).
Principal components from the Spectral-GEM analysis of data from Scenario 1. Subjects are self-identified as UK (black), Italian (red), and non-European (blue).
Principal components 3β6 for data from Scenario 2. PC 1 and PC 2 are quite similar to the eigenvectors shown in Fig. 4. Subjects are self-identified as UK (black), Italian (red), Iberian Peninsula (green), African American (blue), and Indian (orange).
Principal components from the spectral graph approach for Scenario 2. Subjects are self-identified as UK (black), Italian (red), Iberian Peninsula (green), African American (blue), and Indian (orange).
| Name | Type |
|---|---|
| African | cohort |
| African American | cohort |
| African American cluster local | cohort |
| ancestry | phenotype |
| Ashkenazi Jews local | cohort |
| Asian Indian local | phenotype |
| Asian-Indian local | cohort |
| Asian-Indians local | cohort |
| Asian sample local | cohort |
| Australia | cohort |
| Belgium | cohort |
| Bosnia local | cohort |
| British local | cohort |
| British cluster local | cohort |
| British Isles | cohort |
| Canada | cohort |
| case-control sample | cohort |
| Central local | cohort |
| Central cluster local | cohort |
| Cluster A local | cohort |
| Cluster B local | cohort |
| Cluster C local | cohort |
| Cluster D local | cohort |
| Cluster E local | cohort |
| Cluster F local | cohort |
| Cluster G local | cohort |
| Cluster H local | cohort |
| Cluster I local | cohort |
| Cluster J local | cohort |
| Cluster K local | cohort |
| Cluster L local | cohort |
| Cochran-Mantel-Haenszel test local | drug |
| Continental clusters local | cohort |
| Country of origin local | phenotype |
| Cyprus local | cohort |
| dimensions of ancestry local | phenotype |
| disease status | phenotype |
| East Asian | cohort |
| Euclidean distance m(i,j) local | drug |
| European ancestry | cohort |
| European cluster local | cohort |
| European population | cohort |
| Europeans | cohort |
| France | cohort |
| genetic ancestry | phenotype |
| Germany | cohort |
| Greece local | cohort |
| HapMap | cohort |
| HapMap core samples local | cohort |
| Hispanic | phenotype |
| human population structure local | cohort |
| Iberian peninsula | cohort |
| Indian | cohort |
| Indian cluster local | cohort |
| individuals | cohort |
| Italian A local | cohort |
| Italian A cluster local | cohort |
| Italian B local | cohort |
| Italian B cluster local | cohort |
| Italian clusters local | cohort |
| Italy | cohort |
| kernel H local | drug |
| major continental samples local | cohort |
| Mexican local | phenotype |
| Mexican American | cohort |
| Mexicans local | cohort |
| non-European ancestry | cohort |
| North East local | cohort |
| outliers local | cohort |
| outliers local | phenotype |
| PCA | drug |
| Poland | cohort |
| POPRES local | cohort |
| POPRES database local | cohort |
| population-based genetic association study local | cohort |
| Portuguese local | cohort |
| Romania local | cohort |
| Russia | cohort |
| Scenario 1 local | cohort |
| self-identified British and Italian samples local | cohort |
| six unusual subjects local | cohort |
| Small cluster local | cohort |
| smartpca local | drug |
| SNP | cohort |
| South Asian local | phenotype |
| South East local | cohort |
| Spain | cohort |
| Spectral-GEM local | drug |
| subjects | cohort |
| subpopulations local | cohort |
| Swiss local | cohort |
| Switzerland | cohort |
| Turkey local | cohort |
| United Kingdom | cohort |
| United States | cohort |
No uploaded files.
In this knowledge base
External
| Title | Authors | Journal | Year | Link |
|---|---|---|---|---|
| Determining population structure from k-mer frequencies. | Hrytsenko Y et al. | β | 2025 | β |
| Revealing the range of equally likely estimates in the admixture model. | Heinzel CS et al. | β | 2025 | β |
| Revealing the range of maximum likelihood estimates in the admixture model | Heinzel CS et al. | β | 2024 | β |
| Depression pathophysiology, risk prediction of recurrence and comorbid psychiatric disorders using genome-wide analyses. | Als TD et al. | β | 2023 | β |
| Subject clustering by IF-PCA and several recent methods. | Chen D et al. | β | 2023 | β |
| Hereditary variants of unknown significance in African American women with breast cancer. | McDonald JT et al. | β | 2022 | β |
| Inherent Nonlinear Distribution of High-Dimensional Genotypic Data Identified as a Possible Source of Confounding Factors in Population Structure Analysis. | Wang M | β | 2022 | β |
| Genome-wide association identifies the first risk loci for psychosis in Alzheimer disease. | DeMichele-Sweet MAA et al. | β | 2021 | β |
| How rare and common risk variation jointly affect liability for autism spectrum disorder. | Klei L et al. | β | 2021 | β |
| Recommendations for Statistical Reporting in Cardiovascular Medicine: A Special Report From the American Heart Association. | Althouse AD et al. | β | 2021 | β |
| Large eQTL meta-analysis reveals differing patterns between cerebral cortical and cerebellar brain regions. | Sieberts SK et al. | β | 2020 | β |
| Pediatric to Adult Shift in Vitiligo Onset Suggests Altered Environmental Triggering. | Jin Y et al. | β | 2020 | β |
| Early-onset autoimmune vitiligo associated with an enhancer variant haplotype that upregulates class II HLA expression. | Jin Y et al. | β | 2019 | β |
| Family Clustering of Autoimmune Vitiligo Results Principally from Polygenic Inheritance of Common Risk Alleles. | Roberts GHL et al. | β | 2019 | β |
| GRAF-pop: A Fast Distance-Based Method To Infer Subject Ancestry from Multiple Genotype Datasets Without Principal Components Analysis. | Jin Y et al. | β | 2019 | β |
| Statistical Association Mapping of Population-Structured Genetic Data. | Najafi A et al. | β | 2019 | β |
| A practical approach to adjusting for population stratification in genome-wide association studies: principal components and propensity scores (PCAPS). | Zhao H et al. | β | 2018 | β |
| Precision Medicine for Acute Kidney Injury (AKI): Redefining AKI by Agnostic Kidney Tissue Interrogation and Genetics. | Kiryluk K et al. | β | 2018 | β |
| Ancestry inference using principal component analysis and spatial analysis: a distance-based analysis to account for population substructure. | Byun J et al. | β | 2017 | β |
| Genetic Analysis of Mitochondrial Ribosomal Proteins and Cognitive Aging in Postmenopausal Women. | Mozhui K et al. | β | 2017 | β |
| GWAS for serum galactose-deficient IgA1 implicates critical genes of the O-glycosylation pathway. | Kiryluk K et al. | β | 2017 | β |
| Prediction of biogeographical ancestry from genotype: a comparison of classifiers. | Cheung EYY et al. | β | 2017 | β |
| Rare Copy Number Variants in NRXN1 and CNTN6 Increase Risk for Tourette Syndrome. | Huang AY et al. | β | 2017 | β |
| A Method to Exploit the Structure of Genetic Ancestry Space to Enhance Case-Control Studies. | Bodea CA et al. | β | 2016 | β |
| Gene expression elucidates functional impact of polygenic risk for schizophrenia. | Fromer M et al. | β | 2016 | β |
| Genome-wide association studies of autoimmune vitiligo identify 23 new risk loci and highlight key pathways and regulatory variants. | Jin Y et al. | β | 2016 | β |
| Inference and Analysis of Population Structure Using Genetic Data and Network Theory. | Greenbaum G et al. | β | 2016 | β |
| Retrospective Binary-Trait Association Test Elucidates Genetic Architecture of Crohn Disease. | Jiang D et al. | β | 2016 | β |
| Utilizing the Jaccard index to reveal population stratification in sequencing data: a simulation study and an application to the 1000 Genomes Project. | Prokopenko D et al. | β | 2016 | β |
| Detecting individual ancestry in the human genome. | Wollstein A et al. | β | 2015 | β |
| Genomic regions influencing coat color saturation and facial markings in Fleckvieh cattle. | MΓ©szΓ‘ros G et al. | β | 2015 | β |
| Novel genetic matching methods for handling population stratification in genome-wide association studies. | Lacour A et al. | β | 2015 | β |
| Principal component regression and linear mixed model in association analysis of structured samples: competitors or complements? | Zhang Y et al. | β | 2015 | β |
| Recent genomic heritage in Scotland. | Amador C et al. | β | 2015 | β |
| GAGA: a new algorithm for genomic inference of geographic ancestry reveals fine level population substructure in Europeans. | Lao O et al. | β | 2014 | β |
| Adjusting for population stratification in a fine scale with principal components and sequencing data. | Zhang Y et al. | β | 2013 | β |
| Adjustment for population stratification via principal components in association analysis of rare variants. | Zhang Y et al. | β | 2013 | β |
| Analysis of rare, exonic variation amongst subjects with autism spectrum disorders and population controls. | Liu L et al. | β | 2013 | β |
| Enhanced localization of genetic samples through linkage-disequilibrium correction. | Baran Y et al. | β | 2013 | β |
| Softwares and methods for estimating genetic ancestry in human populations. | Liu Y et al. | β | 2013 | β |
| Amino acid position 11 of HLA-DRΞ²1 is a major determinant of chromosome 6p association with ulcerative colitis. | Achkar JP et al. | β | 2012 | β |
| Analyzing genetic association studies with an extended propensity score approach. | Zhao H et al. | β | 2012 | β |
| A permutation procedure to correct for confounders in case-control studies, including tests of rare variation. | Epstein MP et al. | β | 2012 | β |
| Association of CLU and PICALM variants with Alzheimer's disease. | Kamboh MI et al. | β | 2012 | β |
| Common genetic variants, acting additively, are a major source of risk for autism. | Klei L et al. | β | 2012 | β |
| Genome-wide association analysis of circulating vitamin D levels in children with asthma. | Lasky-Su J et al. | β | 2012 | β |
| Genome-wide association study heterogeneous cohort homogenization via subject weight knock-down. | Valente AX et al. | β | 2012 | β |
| Genome-wide association study of Alzheimer's disease with psychotic symptoms. | Hollingworth P et al. | β | 2012 | β |
| Individual common variants exert weak effects on the risk for autism spectrum disorders. | Anney R et al. | β | 2012 | β |
| Manifold learning for human population structure studies. | Siu H et al. | β | 2012 | β |
| Sparse principal component analysis for identifying ancestry-informative markers in genome-wide association studies. | Lee S et al. | β | 2012 | β |
| Statistical distributions of test statistics used for quantitative trait association mapping in structured populations. | TeyssΓ¨dre S et al. | β | 2012 | β |
| Stratification-score matching improves correction for confounding by population stratification in case-control association studies. | Epstein MP et al. | β | 2012 | β |
| A comparison of association methods correcting for population stratification in case-control studies. | Wu C et al. | β | 2011 | β |
| Control for confounding in case-control studies using the stratification score, a retrospective balancing score. | Allen AS et al. | β | 2011 | β |
| Correcting for Population Stratification in Genomewide Association Studies. | Lin DY et al. | β | 2011 | β |
| Gene-ontology enrichment analysis in two independent family-based samples highlights biologically plausible processes for autism spectrum disorders. | Anney RJ et al. | β | 2011 | β |
| Identification of common variants influencing risk of the tauopathy progressive supranuclear palsy. | HΓΆglinger GU et al. | β | 2011 | β |
| No association of psychosis in Alzheimer disease with neurodegenerative pathway genes. | DeMichele-Sweet MA et al. | β | 2011 | β |
| Testing for an unusual distribution of rare variants. | Neale BM et al. | β | 2011 | β |
| A genome-wide scan for common alleles affecting risk for autism. | Anney R et al. | β | 2010 | β |
| Ancestral informative marker selection and population structure visualization using sparse Laplacian eigenfunctions. | Zhang J | β | 2010 | β |
| A SPECTRAL GRAPH APPROACH TO DISCOVERING GENETIC ANCESTRY. | Lee AB et al. | β | 2010 | β |
| Clustering by genetic ancestry using genome-wide SNP data. | Solovieff N et al. | β | 2010 | β |
| Correction for hidden confounders in the genetic analysis of gene expression. | Listgarten J et al. | β | 2010 | β |
| Functional impact of global rare copy number variation in autism spectrum disorders. | Pinto D et al. | β | 2010 | β |
| Genetics in psychiatry: common variant association studies. | Buxbaum JD et al. | β | 2010 | β |
| Genome-wide association study of intracranial aneurysm identifies three new risk loci. | Yasuno K et al. | β | 2010 | β |
| Pharmacogenomics of suicidal events. | Brent D et al. | β | 2010 | β |
| Powerful multi-marker association tests: unifying genomic distance-based regression and logistic regression. | Han F et al. | β | 2010 | β |
| ROADTRIPS: case-control association testing with partially or completely unknown population and pedigree structure. | Thornton T et al. | β | 2010 | β |
| Screen and clean: a tool for identifying interactions in genome-wide association studies. | Wu J et al. | β | 2010 | β |
| SNPs in CAST are associated with Parkinson disease: a confirmation study. | Allen AS et al. | β | 2010 | β |
| The potential for enhancing the power of genetic association studies in African Americans through the reuse of existing genotype data. | Chen GK et al. | β | 2010 | β |
| Using ancestry matching to combine family-based and unrelated samples for genome-wide association studies. | Crossett A et al. | β | 2010 | β |
| Using principal components of genetic variation for robust and powerful detection of gene-gene interactions in case-control and case-only studies. | Bhattacharjee S et al. | β | 2010 | β |
| Laplacian eigenfunctions learn population structure. | Zhang J et al. | β | 2009 | β |