A comparison of Cox and logistic regression for use in genome-wide association studies of cohort and case-cohort design.
- Authors
- Staley, James R; Jones, Edmund; Kaptoge, Stephen; Butterworth, Adam S; Sweeting, Michael J; Wood, Angela M; Howson, Joanna M M
- Year
- 2017
- Journal
- European journal of human genetics : EJHG
- PMID
- 28594416
- DOI
- 10.1038/ejhg.2017.78
- PMCID
- PMC5520083
Logistic regression is often used instead of Cox regression to analyse genome-wide association studies (GWAS) of single-nucleotide polymorphisms (SNPs) and disease outcomes with cohort and case-cohort designs, as it is less computationally expensive. Although Cox and logistic regression models have been compared previously in cohort studies, this work does not completely cover the GWAS setting nor extend to the case-cohort study design. Here, we evaluated Cox and logistic regression applied to cohort and case-cohort genetic association studies using simulated data and genetic data from the EPIC-CVD study. In the cohort setting, there was a modest improvement in power to detect SNP-disease associations using Cox regression compared with logistic regression, which increased as the disease incidence increased. In contrast, logistic regression had more power than (Prentice weighted) Cox regression in the case-cohort setting. Logistic regression yielded inflated effect estimates (assuming the hazard ratio is the underlying measure of association) for both study designs, especially for SNPs with greater effect on disease. Given logistic regression is substantially more computationally efficient than Cox regression in both settings, we propose a two-step approach to GWAS in cohort and case-cohort studies. First to analyse all SNPs with logistic regression to identify associated variants below a pre-defined P-value threshold, and second to fit Cox regression (appropriately weighted in case-cohort studies) to those identified SNPs to ensure accurate estimation of association with disease.
Difference in power between the Cox and logistic regression models for an SNP with a risk allele frequency of 10% for the cohort study design. The red, blue and green lines represent the sample sizes 5000, 10 000 and 25 000, respectively. Complete (a), Survey (b) and Random (c) are the types of follow-up and 5, 10 and 15% are the cumulative disease incidences.
Difference in power between the Cox and logistic regression models for an SNP with a risk allele frequency of 10% for the case-cohort study design. The red, blue and green lines represent the sampling fractions of 5, 10 and 15%, respectively. Complete (a), Survey (b) and Random (c) are the types of follow-up and 5, 10 and 15% are the cumulative disease incidences.
No entities extracted from this document yet.
No uploaded files.
| Citation | PMID | DOI | Status |
|---|---|---|---|
| Annesi I, Moreau T, Lellouch J: Efficiency of the logistic regression and Cox proportional hazards models in longitudinal studies. Stat Med 1989; 8: 1515β1521.261694110.1002/sim.4780081211 | β | β | β |
| Aulchenko YS, Ripke S, Isaacs A, Van Duijn CM: GenABEL: an R library for genome-wide association analysis. Bioinformatics 2007; 23: 1294β1296.1738401510.1093/bioinformatics/btm108 | β | β | β |
| Barlow W, Ichikawa L, Rosner D, Izumi S: Analysis of case-cohort designs. J Clin Epidemiol 1999; 52: 1165β1172.1058077910.1016/s0895-4356(99)00102-x | β | β | β |
| Bender R, Augustin T, Blettner M: Generating survival times to simulate Cox proportional hazards models. Stat Med 2005; 24: 1713β1723.1572423210.1002/sim.2059 | β | β | β |
| Callas PW, Pastides H, Hosmer DW: Empirical comparisons of proportional hazards, Poisson, and logistic regression modeling of occupational cohort data. Am J Ind Med 1998; 33: 33β47.940852710.1002/(sici)1097-0274(199801)33:1<33::aid-ajim5>3.0.co;2-x | β | β | β |
| Cuzick J: The efficiency of the proportions test and the logrank test for censored survival data. Biometrics 1982; 38: 1033β1039. | β | β | β |
| Danesh J, Saracci R, Berglund G, Feskens E, Overvad K, Panico S et al: EPIC-Heart: the cardiovascular component of a prospective study of nutritional, lifestyle and biological factors in 520,000 middle-aged participants from 10 European countries. Eur J Epidemiol 2007; 22: 129β1241.1729509710.1007/s10654-006-9096-8 | β | β | β |
| Deloukas P, Kanoni S, Willenborg C et al: Large-scale association analysis identifies new risk loci for coronary artery disease. Nat Genet 2013; 45: 25β33.2320212510.1038/ng.2480PMC3679547 | β | β | β |
| Green MS, Symons MJ: A comparison of the logistic risk function and the proportional hazards model in prospective epidemiologic studies. J Chronic Dis 1983; 36: 715β723.663040710.1016/0021-9681(83)90165-0 | β | β | β |
| Ingram DD, Kleinman JC: Empirical comparisons of proportional hazards and logistic regression models. Stat Med 1989; 8: 525β538.272747310.1002/sim.4780080502 | β | β | β |
| Langenberg C, Sharp S, Forouhi N et al: Design and cohort description of the InterAct Project: an examination of the interaction of genetic and lifestyle factors on the incidence of type 2 diabetes in the EPIC Study. Diabetologia 2011; 54: 2272β2282.2171711610.1007/s00125-011-2182-9PMC4222062 | β | β | β |
| Peduzzi P, Holford T, Detre K, Chan Y: Comparison of the logistic and Cox regression models when outcome is determined in all patients after a fixed period of time. J Chronic Dis 1987; 40: 761β767.359767710.1016/0021-9681(87)90127-5 | β | β | β |
| Prentice R: A case-cohort design for epidemiologic cohort studies and disease prevention trials. Biometrika 1986; 73: 1β11. | β | β | β |
| Schunkert H, KΓΆnig IR, Kathiresan S et al: Large-scale association analysis identifies 13 new susceptibility loci for coronary artery disease. Nat Genet 2011; 43: 333β338.2137899010.1038/ng.784PMC3119261 | β | β | β |
| van der Net JB, Janssens ACJ, Eijkemans MJ, Kastelein JJ, Sijbrands EJ, Steyerberg EW: Cox proportional hazards models have more statistical power than logistic regression models in cross-sectional genetic association studies. Eur J Hum Genet 2008; 16: 1111β1116.1838247610.1038/ejhg.2008.59 | β | β | β |
In this knowledge base
| Title | Year | PMID |
|---|---|---|
| Meta-analysis of up to 622,409 individuals identifies 40 novel smoking behaviour associated genetic loci. | 2020 | 30617275 |
External
| Title | Authors | Journal | Year | Link |
|---|---|---|---|---|
| Analysis of follow-up data in large biobank cohorts: a review of methodology. | Kolde A et al. | β | 2025 | β |
| Applying weighted Cox regression to genome-wide association studies of time-to-event phenotypes. | Li Y et al. | β | 2025 | β |
| Efficient and accurate framework for genome-wide gene-environment interaction analysis in large-scale biobanks. | Ma Y et al. | β | 2025 | β |
| Genetic association studies using disease liabilities from deep neural networks. | Yang L et al. | β | 2025 | β |
| Prediction of Lymph Node Metastasis in Non-Small Cell Lung Carcinoma Using Primary Tumor Somatic Mutation Data. | Lee V et al. | β | 2025 | β |
| Refining bias correction in genome-wide association analyses of case-control studies | Darbani B et al. | β | 2025 | β |
| Thrombotic risk determined by <i>ABO</i>, <i>F8</i>, and <i>VWF</i> variants in a population-based cohort study. | Manderstedt E et al. | β | 2025 | β |
| An Investigation into the Relationship of Circulating Gut Microbiome Molecules and Inflammatory Markers with the Risk of Incident Dementia in Later Life. | Oluwagbemigun K et al. | β | 2024 | β |
| Genetic variants for Alzheimer's disease and comorbid conditions. | Pan M et al. | β | 2024 | β |
| Letter to the Editor on "Dislocation Following Anterior and Posterior Total Hip Arthroplasty in the Setting of Spinal Deformity and Stiffness: Evolving Trends Using a High-Risk Protocol at a Single Tertiary Center". | Shaker F et al. | β | 2024 | β |
| ADuLT: An efficient and robust time-to-event GWAS. | Pedersen EM et al. | β | 2023 | β |
| Age at Menopause and the Risk of Stroke: Observational and Mendelian Randomization Analysis in 204β244 Postmenopausal Women. | Tschiderer L et al. | β | 2023 | β |
| Artificial intelligence for dementia genetics and omics. | Bettencourt C et al. | β | 2023 | β |
| Association of Circulating Caprylic Acid with Risk of Mild Cognitive Impairment and Alzheimer's Disease in the Alzheimer's Disease Neuroimaging Initiative (ADNI) Cohort. | Fan L et al. | β | 2023 | β |
| Improving efficiency of fitting Cox proportional hazards models for time-to-event outcomes in genome-wide association studies (GWAS). | Gebski V et al. | β | 2023 | β |
| Preterm Prelabor Rupture of Membranes Linked to Vaginal Bacteriome of Pregnant Females in the Early Second Trimester: a Case-Cohort Design. | Mu Y et al. | β | 2023 | β |
| Cox model and decision trees: an application to breast cancer data. | Pereira LC et al. | β | 2022 | β |
| Cox regression is robust to inaccurate EHR-extracted event time: an application to EHR-based GWAS. | Irlmeier R et al. | β | 2022 | β |
| Efficient and accurate frailty model approach for genome-wide survival association analysis in large-scale biobanks. | Dey R et al. | β | 2022 | β |
| Risk Factors for Inpatient Hypoglycemia in a Tertiary Care Hospital in Indonesia. | Pratiwi C et al. | β | 2022 | β |
| The trends and efficacy of operation in the treatment of hepatocellular carcinoma. | Wu L et al. | β | 2022 | β |
| An exploration of genetic association tests for disease risk and age at onset. | Martin ER et al. | β | 2021 | β |
| A novel age-informed approach for genetic association analysis in Alzheimer's disease. | Le Guen Y et al. | β | 2021 | β |
| Factors associated with long-term graft survival in pediatric kidney transplant recipients. | Anand A et al. | β | 2021 | β |
| Genomic architecture and prediction of censored time-to-event phenotypes with a Bayesian genome-wide analysis. | Ojavee SE et al. | β | 2021 | β |
| Mammographic features are associated with cardiometabolic disease risk and mortality. | Grassmann F et al. | β | 2021 | β |
| Set-based genetic association and interaction tests for survival outcomes based on weighted V statistics. | Li C et al. | β | 2021 | β |
| A Fast and Accurate Method for Genome-Wide Time-to-Event Data Analysis and Its Application to UK Biobank. | Bi W et al. | β | 2020 | β |
| Fast Algorithms for Conducting Large-Scale GWAS of Age-at-Onset Traits Using Cox Mixed-Effects Models. | He L et al. | β | 2020 | β |
| Genome-wide association analysis of type 2 diabetes in the EPIC-InterAct study. | Cai L et al. | β | 2020 | β |
| Impact of sitagliptin combination therapy and hypoglycemia in Japanese patients with type 2 diabetes: a multi-center retrospective observational cohort study. | Saito T et al. | β | 2020 | β |
| Meta-analysis of up to 622,409 individuals identifies 40 novel smoking behaviour associated genetic loci. | Erzurumluoglu AM et al. | β | 2020 | β |
| The association between Single Nucleotide Polymorphisms of Klotho Gene and Mortality in Elderly Men: The MrOS Sweden Study. | Wu PH et al. | β | 2020 | β |
| Cox regression increases power to detect genotype-phenotype associations in genomic studies using the electronic health record. | Hughey JJ et al. | β | 2019 | β |
| Polygenic risk and hazard scores for Alzheimer's disease prediction. | Leonenko G et al. | β | 2019 | β |
| Relating the gut metagenome and metatranscriptome to immunotherapy responses in melanoma patients. | Peters BA et al. | β | 2019 | β |