Population substructure and control selection in genome-wide association studies.
- Authors
- Yu, Kai; Wang, Zhaoming; Li, Qizhai; Wacholder, Sholom; Hunter, David J; Hoover, Robert N; Chanock, Stephen; Thomas, Gilles
- Year
- 2008
- Journal
- PloS one
- PMID
- 18596976
- DOI
- 10.1371/journal.pone.0002551
- PMCID
- PMC2432498
Determination of the relevance of both demanding classical epidemiologic criteria for control selection and robust handling of population stratification (PS) represents a major challenge in the design and analysis of genome-wide association studies (GWAS). Empirical data from two GWAS in European Americans of the Cancer Genetic Markers of Susceptibility (CGEMS) project were used to evaluate the impact of PS in studies with different control selection strategies. In each of the two original case-control studies nested in corresponding prospective cohorts, a minor confounding effect due to PS (inflation factor lambda of 1.025 and 1.005) was observed. In contrast, when the control groups were exchanged to mimic a cost-effective but theoretically less desirable control selection strategy, the confounding effects were larger (lambda of 1.090 and 1.062). A panel of 12,898 autosomal SNPs common to both the Illumina and Affymetrix commercial platforms and with low local background linkage disequilibrium (pair-wise r(2)<0.004) was selected to infer population substructure with principal component analysis. A novel permutation procedure was developed for the correction of PS that identified a smaller set of principal components and achieved a better control of type I error (to lambda of 1.032 and 1.006, respectively) than currently used methods. The overlap between sets of SNPs in the bottom 5% of p-values based on the new test and the test without PS correction was about 80%, with the majority of discordant SNPs having both ranks close to the threshold. Thus, for the CGEMS GWAS of prostate and breast cancer conducted in European Americans, PS does not appear to be a major problem in well-designed studies. A study using suboptimal controls can have acceptable type I error when an effective strategy for the correction of PS is employed.
A diagram for the three main sets of SNPs used in the text.The first set of PCA SNPs is used to identify hidden population substructure. The set of genomic control SNPs is used to evaluate the over-dispersion factor in a given study, as well as in the proposed permutation procedure to select relevant PCs for the correction of PS. The second set of PCA SNPs is used to validate findings from the first set of PCA SNPs. In applications, only the first set of PCA SNPs is recommended.
Samples represented by their first two principal components.Principal components (PC, the 1st along the horizontal direction, the 2nd along the vertical direction)) were obtained by applying the PCA on the joint sample of PLCO prostate cancer and NHS breast cancer studies. A) First two PCs for subjects from the PLCO prostate cancer study. B) First two PCs for subjects from the NHS breast cancer study.
Q-Q plot based on the test without PC adjustment.For each of the four analyses, the Q-Q plot is based on P-values (in log10 scale) that correspond to the 1 d.f. Wald test on 475,116 testing autosomal SNPs by assuming an additive risk model (in logit scale) and without PC adjustment. A) Results for the original prostate cancer study (prostate cancer cases and controls from PLCO). B) Result for the reconstructed prostate cancer study using external controls (prostate cancer cases from PLCO, and external controls from NHS). C) Results for the original breast cancer study (breast cancer cases and controls from NHS). D) Results for the reconstructed breast cancer study using external controls (breast cancer cases from NHS, and external controls from PLCO).
Q-Q plot based on the test with PC adjustment.For each of the four analyses, the Q-Q plot is based on P-values (in log10 scale) that correspond to the 1 d.f. Wald test on 475,116 testing autosomal SNPs by assuming an additive risk model (in logit scale) and with PC adjustment. The PCs used in adjustment are selected by the proposed permutation procedure. A) Results for the original prostate cancer study (prostate cancer cases and controls from PLCO). B) Results for the reconstructed prostate cancer study using external controls (prostate cancer cases from PLCO, and external controls from NHS). C) Results for the original breast cancer study (breast cancer cases and controls from NHS). D) Results for the reconstructed breast cancer study using external controls (breast cancer cases from NHS, and external controls from PLCO).
SNP ranking correlation in prostate cancer studies.In each plot, SNPs' rankings based on the 1 d.f. Wald test on 475,116 testing autosomal SNPs without PC adjustment are compared with their rankings based on the 1 d.f. Wald test with adjustment for PCs chosen by the permutation procedure. The SNPs in blue are ranked among the top 5% by tests both with and without PC adjustment. The SNPs in green and orange are ranked among the top 5% by only one of the tests. A) Results based on the original prostate cancer study (prostate cancer cases and controls from PLCO). The 1st PC was chosen for PS correction. B) Results based on the reconstructed prostate cancer study using external controls (prostate cancer cases from PLCO, and external controls from NHS). The 1st, 2nd and 4th PCs were chosen for PS correction.
The conditional ranking distribution for the original PLCO prostate cancer study.Each plot shows the histogram of ranks according to the test without PC adjustment for SNPs ranked within a given range by the test with the adjustment for the 1st PC (chosen by the proposed permutation procedure). The ranking ranges (%) are shown on the horizontal axis. The frequencies (%) are shown on the vertical axis. A) The histogram of ranks for SNPs ranked in the top 0β1% by the test with PC adjustment. B) The histogram of ranks for SNPs ranked in the top 1β2% by the test with PC adjustment. C) The histogram of ranks for SNPs ranked in the top 2β3% by the test with PC adjustment. D) The histogram of ranks for SNPs ranked in the top 3β4% by the test with PC adjustment. E) The histogram of ranks for SNPs ranked in the top 4β5% by the test with PC adjustment.
The conditional ranking distribution for the reconstructed prostate cancer study using external controls.Each plot shows the histogram of ranks according to the test without PC adjustment for SNPs ranked within a given range by the test with the adjustment for the 1st, 2nd, and 4th PCs (chosen by the proposed permutation procedure). The ranking ranges (%) are shown on the horizontal axis. The frequencies (%) are shown on the vertical axis. A) The histogram of ranks for SNPs ranked in the top 0β1% by the test with PC adjustment. B) The histogram of ranks for SNPs ranked in the top 1β2% by the test with PC adjustment. C) The histogram of ranks for SNPs ranked in the top 2β3% by the test with PC adjustment. D) The histogram of ranks for SNPs ranked in the top 3β4% by the test with PC adjustment. E) The histogram of ranks for SNPs ranked in the top 4β5% by the test with PC adjustment.
No entities extracted from this document yet.
No uploaded files.
In this knowledge base
External
| Title | Authors | Journal | Year | Link |
|---|---|---|---|---|
| Adjusting for principal components can induce collider bias in genome-wide association studies. | Grinde KE et al. | β | 2024 | β |
| Sparse Multitask group Lasso for Genome-Wide Association Studies | Nouira A et al. | β | 2024 | β |
| St. Jude Survivorship Portal: Sharing and Analyzing Large Clinical and Genomic Datasets from Pediatric Cancer Survivors. | Matt GY et al. | β | 2024 | β |
| Genome-Wide Association Study in Acute Tubulointerstitial Nephritis. | Zhou XJ et al. | β | 2023 | β |
| Population stratification correction using Bayesian shrinkage priors for genetic association studies. | Liu Z et al. | β | 2023 | β |
| Breeding and genetics of disease resistance in temperate fruit trees: challenges and new opportunities. | Khan A et al. | β | 2022 | β |
| PCAmatchR: a flexible R package for optimal case-control matching using weighted principal components. | Brown DW et al. | β | 2021 | β |
| Genetically predicted telomere length is associated with clonal somatic copy number alterations in peripheral leukocytes. | Brown DW et al. | β | 2020 | β |
| Low-frequency variation near common germline susceptibility loci are associated with risk of Ewing sarcoma. | Lin SH et al. | β | 2020 | β |
| Polygenic risk score for the prediction of breast cancer is related to lesser terminal duct lobular unit involution of the breast. | Bodelon C et al. | β | 2020 | β |
| A Powerful Method To Test Associations Between Ordinal Traits and Genotypes. | Wang J et al. | β | 2019 | β |
| Inherited genetic susceptibility to acute lymphoblastic leukemia in Down syndrome. | Brown AL et al. | β | 2019 | β |
| Principals about principal components in statistical genetics. | Abegaz F et al. | β | 2019 | β |
| Childhood asthma is associated with COPD and known asthma variants in COPDGene: a genome-wide association study. | Hayden LP et al. | β | 2018 | β |
| Efficiency of different strategies to mitigate ascertainment bias when using SNP panels in diversity studies. | Malomane DK et al. | β | 2018 | β |
| Genome-wide association study identifies multiple new loci associated with Ewing sarcoma susceptibility. | Machiela MJ et al. | β | 2018 | β |
| Polygenic Determinants for Subsequent Breast Cancer Risk in Survivors of Childhood Cancer: The St Jude Lifetime Cohort Study (SJLIFE). | Wang Z et al. | β | 2018 | β |
| Two high-risk susceptibility loci at 6p25.3 and 14q32.13 for WaldenstrΓΆm macroglobulinemia. | McMaster ML et al. | β | 2018 | β |
| A comprehensive survey of genetic variation in 20,691 subjects from four large cohorts. | LindstrΓΆm S et al. | β | 2017 | β |
| A genome-wide association study of LCH identifies a variant in <i>SMAD6</i> associated with susceptibility. | Peckham-Gregory EC et al. | β | 2017 | β |
| Susceptibility to Childhood Pneumonia: A Genome-Wide Analysis. | Hayden LP et al. | β | 2017 | β |
| Genome-wide association study confirms lung cancer susceptibility loci on chromosomes 5p15 and 15q25 in an African-American population. | Zanetti KA et al. | β | 2016 | β |
| Genome-wide association study for rotator cuffΒ tears identifies two significant single-nucleotide polymorphisms. | Tashjian RZ et al. | β | 2016 | β |
| Genome-wide association study identifies 14 novel risk alleles associated with basal cell carcinoma. | Chahal HS et al. | β | 2016 | β |
| Genome-wide association study identifies novel susceptibility loci for cutaneous squamous cell carcinoma. | Chahal HS et al. | β | 2016 | β |
| Group-combined P-values with applications to genetic association studies. | Hu X et al. | β | 2016 | β |
| A Novel Risk Locus at 6p21.3 for Epstein-Barr Virus-Positive Hodgkin Lymphoma. | Delahaye-Sourdeix M et al. | β | 2015 | β |
| A rare truncating BRCA2 variant and genetic susceptibility to upper aerodigestive tract cancer. | Delahaye-Sourdeix M et al. | β | 2015 | β |
| A systematic investigation of the contribution of genetic variation within the MHC region to HPV seropositivity. | Chen D et al. | β | 2015 | β |
| Different Evolutionary History for Basque Diaspora Populations in USA and Argentina Unveiled by Mitochondrial DNA Analysis. | Baeta M et al. | β | 2015 | β |
| Physiological phenotyping of plants for crop improvement. | Ghanem ME et al. | β | 2015 | β |
| Significant association of full-thickness rotator cuff tears and estrogen-related receptor-Ξ² (ESRRB). | Teerlink CC et al. | β | 2015 | β |
| The 12p13.33/RAD52 locus and genetic susceptibility to squamous cell cancers of upper aerodigestive tract. | Delahaye-Sourdeix M et al. | β | 2015 | β |
| Two susceptibility loci identified for prostate cancer aggressiveness. | Berndt SI et al. | β | 2015 | β |
| A genome-wide association study of renal cell carcinoma among African Americans. | Purdue MP et al. | β | 2014 | β |
| Dubowitz syndrome is a complex comprised of multiple, genetically distinct and phenotypically overlapping disorders. | Stewart DR et al. | β | 2014 | β |
| Genome-wide association study identifies multiple loci associated with bladder cancer risk. | Figueroa JD et al. | β | 2014 | β |
| Genome-wide association study of endometrial cancer in E2C2. | De Vivo I et al. | β | 2014 | β |
| A genome-wide association study reveals ARL15, a novel non-HLA susceptibility gene for rheumatoid arthritis in North Indians. | Negi S et al. | β | 2013 | β |
| Genetic association with multiple traits in the presence of population stratification. | Yan T et al. | β | 2013 | β |
| One thousand genomes imputation in the National Cancer Institute Breast and Prostate Cancer Cohort Consortium aggressive prostate cancer genome-wide association study. | Machiela MJ et al. | β | 2013 | β |
| Selective estrogen receptor modulators and pharmacogenomic variation in ZNF423 regulation of BRCA1 expression: individualized breast cancer prevention. | Ingle JN et al. | β | 2013 | β |
| Use of systems biology approaches to analysis of genome-wide association studies of myocardial infarction and blood cholesterol in the nurses' health study and health professionals' follow-up study. | Reilly D et al. | β | 2013 | β |
| A meta-analysis of genome-wide association studies of breast cancer identifies two novel susceptibility loci at 6q14 and 20q11. | Siddiq A et al. | β | 2012 | β |
| Assessing disease risk in genome-wide association studies using family history. | Ghosh A et al. | β | 2012 | β |
| A unique genome-wide association analysis in extended Utah high-risk pedigrees identifies a novel melanoma risk variant on chromosome arm 10q. | Teerlink C et al. | β | 2012 | β |
| Common variants in FTO, MC4R, TMEM18, PRL, AIF1, and PCSK1 show evidence of association with adult obesity in the Greek population. | Rouskas K et al. | β | 2012 | β |
| Genome-wide association study of classical Hodgkin lymphoma and Epstein-Barr virus status-defined subgroups. | Urayama KY et al. | β | 2012 | β |
| Genome-wide association study of glioma and meta-analysis. | Rajaraman P et al. | β | 2012 | β |
| Principal components analysis of population admixture. | Ma J et al. | β | 2012 | β |
| The association between inflammation-related genes and serum androgen levels in men: the prostate, lung, colorectal, and ovarian study. | Meyer TE et al. | β | 2012 | β |
| The utility of mitochondrial and y chromosome phylogenetic data to improve correction for population stratification. | Makowsky R et al. | β | 2012 | β |
| Tracking the emergence of a new breed using 49,034 SNP in sheep. | Kijas JW et al. | β | 2012 | β |
| Unidentified genetic variants influence pancreatic cancer risk: an analysis of polygenic susceptibility in the PanScan study. | Pierce BL et al. | β | 2012 | β |
| Using prior information from the medical literature in GWAS of oral cancer identifies novel susceptibility variant on chromosome 4--the AdAPT method. | Johansson M et al. | β | 2012 | β |
| A genome-wide association study of upper aerodigestive tract cancers conducted within the INHANCE consortium. | McKay JD et al. | β | 2011 | β |
| Choice of population structure informative principal components for adjustment in a case-control study. | Peloso GM et al. | β | 2011 | β |
| Current status of genome-wide association studies in cancer. | Chung CC et al. | β | 2011 | β |
| Fine mapping of a region of chromosome 11q13 reveals multiple independent loci associated with risk of prostate cancer. | Chung CC et al. | β | 2011 | β |
| Genes involved in vasoconstriction and vasodilation system affect salt-sensitive hypertension. | Citterio L et al. | β | 2011 | β |
| Genome-wide association study of HPV seropositivity. | Chen D et al. | β | 2011 | β |
| Genome-wide association study of renal cell carcinoma identifies two susceptibility loci on 2p21 and 11q13.3. | Purdue MP et al. | β | 2011 | β |
| Large-scale fine mapping of the HNF1B locus and prostate cancer risk. | Berndt SI et al. | β | 2011 | β |
| Matching on Race and Ethnicity in Case-Control Studies as a Means of Control for Population Stratification. | Chokkalingam AP et al. | β | 2011 | β |
| Single-nucleotide polymorphisms (5p15.33, 15q25.1, 6p22.1, 6q27 and 7p15.3) and lung cancer survival in the European Prospective Investigation into Cancer and Nutrition (EPIC). | Xun WW et al. | β | 2011 | β |
| A multi-stage genome-wide association study of bladder cancer identifies multiple susceptibility loci. | Rothman N et al. | β | 2010 | β |
| A shared susceptibility locus in PLCE1 at 10q23 for gastric adenocarcinoma and esophageal squamous cell carcinoma. | Abnet CC et al. | β | 2010 | β |
| Genetic admixture and population substructure in Guanacaste Costa Rica. | Wang Z et al. | β | 2010 | β |
| Genetic susceptibility to type 2 diabetes is associated with reduced prostate cancer risk. | Pierce BL et al. | β | 2010 | β |
| Genetic variants at 2q24 are associated with susceptibility to type 2 diabetes. | Qi L et al. | β | 2010 | β |
| Genome-wide associations and functional genomic studies of musculoskeletal adverse events in women receiving aromatase inhibitors. | Ingle JN et al. | β | 2010 | β |
| Genome-wide association studies in cancer--current and future directions. | Chung CC et al. | β | 2010 | β |
| Germline genetic variation, cancer outcome, and pharmacogenetics. | Coate L et al. | β | 2010 | β |
| Identification of genetic and epigenetic marks involved in population structure. | Liu J et al. | β | 2010 | β |
| Inflammatory genetic markers of prostate cancer risk. | Tindall EA et al. | β | 2010 | β |
| Magnitude of stratification in human populations and impacts on genome wide association studies. | Hao K et al. | β | 2010 | β |
| Multiple common variants for celiac disease influencing immune gene expression. | Dubois PC et al. | β | 2010 | β |
| NordicDB: a Nordic pool and portal for genome-wide control data. | Leu M et al. | β | 2010 | β |
| Pesticide use modifies the association between genetic variants on chromosome 8q24 and prostate cancer. | Koutros S et al. | β | 2010 | β |
| Quality control and quality assurance in genotypic data for genome-wide association studies. | Laurie CC et al. | β | 2010 | β |
| Self-reported ethnicity, genetic structure and the impact of population stratification in a multiethnic study. | Wang H et al. | β | 2010 | β |
| Theoretical formulation of principal components analysis to detect and correct for population stratification. | Ma J et al. | β | 2010 | β |
| Using principal components of genetic variation for robust and powerful detection of gene-gene interactions in case-control and case-only studies. | Bhattacharjee S et al. | β | 2010 | β |
| Using public control genotype data to increase power and decrease cost of case-control genetic association studies. | Ho LA et al. | β | 2010 | β |
| A genome-wide association study of lung cancer identifies a region of chromosome 5p15 associated with risk for adenocarcinoma. | Landi MT et al. | β | 2009 | β |
| A genome-wide association study primer for clinicians. | Wang TH et al. | β | 2009 | β |
| A multistage genome-wide association study in breast cancer identifies two new risk alleles at 1p11.2 and 14q24.1 (RAD51L1). | Thomas G et al. | β | 2009 | β |
| A neurologist's guide to genome-wide association studies. | Mullen SA et al. | β | 2009 | β |
| Common variation in genes related to innate immunity and risk of adult glioma. | Rajaraman P et al. | β | 2009 | β |
| Copy-number variants in neurodevelopmental disorders: promises and challenges. | Merikangas AK et al. | β | 2009 | β |
| Genetic background comparison using distance-based regression, with applications in population stratification evaluation and adjustment. | Li Q et al. | β | 2009 | β |
| Genetic variants in pigmentation genes, pigmentary phenotypes, and risk of skin cancer in Caucasians. | Nan H et al. | β | 2009 | β |
| Genetic variants in the vitamin d receptor are associated with advanced prostate cancer at diagnosis: findings from the prostate testing for cancer and treatment study and a systematic review. | Chen L et al. | β | 2009 | β |
| Genetic variations in esophageal cancer risk and prognosis. | Cheung WY et al. | β | 2009 | β |
| Genome-wide association study identifies variants in the ABO locus associated with susceptibility to pancreatic cancer. | Amundadottir L et al. | β | 2009 | β |
| Hippocampal atrophy as a quantitative trait in a genome-wide association study identifying novel susceptibility genes for Alzheimer's disease. | Potkin SG et al. | β | 2009 | β |
| Principal-component-based population structure adjustment in the North American Rheumatoid Arthritis Consortium data: impact of single-nucleotide polymorphism set and analysis method. | Peloso GM et al. | β | 2009 | β |
| Association between BBS6/MKKS gene polymorphisms, obesity and metabolic syndrome in the Greek population. | Rouskas K et al. | β | 2008 | β |
| Cancer genetic association studies in the genome-wide age. | Savage SA | β | 2008 | β |
| Intermediacy and gene-environment interaction: the example of CHRNA5-A3 region, smoking, nicotine dependence, and lung cancer. | Wacholder S et al. | β | 2008 | β |