Disease model distortion in association studies.
- Authors
- Vukcevic, Damjan; Hechter, Eliana; Spencer, Chris; Donnelly, Peter
- Year
- 2011
- Journal
- Genetic epidemiology
- PMID
- 21416505
- DOI
- 10.1002/gepi.20576
- PMCID
- PMC3110308
Most findings from genome-wide association studies (GWAS) are consistent with a simple disease model at a single nucleotide polymorphism, in which each additional copy of the risk allele increases risk by the same multiplicative factor, in contrast to dominance or interaction effects. As others have noted, departures from this multiplicative model are difficult to detect. Here, we seek to quantify this both analytically and empirically. We show that imperfect linkage disequilibrium (LD) between causal and marker loci distorts disease models, with the power to detect such departures dropping off very quickly: decaying as a function of r4, where r2 is the usual correlation between the causal and marker loci, in contrast to the well-known result that power to detect a multiplicative effect decays as a function of r2. We perform a simulation study with empirical patterns of LD to assess how this disease model distortion is likely to impact GWAS results. Among loci where association is detected, we observe that there is reasonable power to detect substantial deviations from the multiplicative model, such as for dominant and recessive models. Thus, it is worth explicitly testing for such deviations routinely.
The effective additive parameter for three disease models, plotted against the RAF. A homozygous RR of 1.42 and an equal number of cases and controls were assumed for all disease models. The right-hand y-axis shows the per-allele RR corresponding to each value of β′ (i.e. RR′ = eβ′). Note that for the multiplicative model, β′ = β = log(1.4) for all RAFs. RAF, risk allele frequency; RR, relative risk.
LLM interpretation
This line graph plots the effective additive parameter ($\beta'$) and corresponding per-allele relative risk ($\text{RR}'$) against the risk allele frequency (RAF) for three disease models. The dominant model (black line) shows a steady decrease in $\beta'$ as RAF increases, while the recessive model (blue line) shows a steady increase. The multiplicative model (magenta line) remains constant across all RAF values.
Impact of LD on disease model parameters for a dominant model. Parameter values as functions of r, for a selection of RAFs. A dominant model with a homozygous RR of 1.42 at the causal SNP is assumed, corresponding to general model parameter values of βA = γA = log(1.4) = 0.34. The solid black line shows the dominance parameter (γB), the dashed black line the additive parameter (βB), and the magenta line the effective additive parameter () at the marker SNP. The respective parameter values at the causal SNP are shown by points at r = 1, following the same color scheme as the lines (in this case, the points for βA and γA overlap since they have the same value). Plots in each row correspond to a given marker SNP RAF and columns to a given causal SNP RAF, as labeled. The range of possible values of r depends on the allele frequencies, as shown by Equation (3). Note that a negative value for β is equivalent to a positive value for it when considered with respect to the other allele at the SNP. RAF, risk allele frequency; LD, linkage disequilibrium; RR, relative risk; SNP, single nucleotide polymorphism.
LLM interpretation
This figure consists of a 2x2 grid of line plots showing how disease model parameters vary as a function of linkage disequilibrium ($r$) for different risk allele frequencies (RAF) of causal and marker SNPs. The x-axis represents $r$ (LD) and the y-axis represents the parameter value, with lines depicting the dominance parameter ($\gamma_B$, solid black), additive parameter ($\beta_B$, dashed black), and effective additive parameter (magenta). Points at $r=1$ indicate the parameter values at the causal SNP, and the plots demonstrate that the relationship between these parameters and LD depends on whether the SNPs are rare (RAF = 0.1) or common (RAF = 0.5).
Impact of LD on disease model parameters for a recessive model. Same as Figure 2, but now for a recessive model with a homozygous RR of 1.42, corresponding to general model parameter values of βA = −γA = log(1.4) = 0.34. LD, linkage disequilibrium; RR, relative risk.
LLM interpretation
This figure consists of a four-panel grid of line plots showing the relationship between linkage disequilibrium (r LD) on the x-axis and parameter value on the y-axis for a recessive disease model. The panels compare different allele frequencies (RAF) for the causal SNP (0.1 rare vs. 0.5 common) and the marker SNP (0.1 rare vs. 0.5 common). Each panel displays multiple curves (solid black, dashed black, and solid magenta) illustrating how the parameter value changes as LD varies from -1.0 to 1.0.
Model space plot showing distortion toward a multiplicative model. The two disease parameters (dominance vs. additive; γ vs. β) plotted against each other showing the full space of models up to the value of the baseline parameter (µ). The horizontal gray line shows the subspace of multiplicative models. The gray lines above the horizontal show the subspace of dominant models, and those below show the subspace of recessive models. Curves and points trace out the models for the scenarios shown in Figures 2 and 3, lying above and below the horizontal line, respectively. Curves are drawn in different styles to show the causal and marker SNP RAFs they correspond to, as shown by the two legends. The two points represent the true disease models at the causal SNP. SNP, single nucleotide polymorphism; RAF, risk allele frequency.
LLM interpretation
This is a model space plot with the disease parameter $\beta$ on the x-axis and $\gamma$ on the y-axis. The plot features a horizontal gray line representing multiplicative models, with regions above and below indicating dominant and recessive subspaces, respectively. Several solid and dashed curves, color-coded by Causal SNP RAF (magenta for 0.1, black for 0.5) and styled by Marker SNP RAF, trace model trajectories across these subspaces. Two black points mark the true disease models at the causal SNP.
Parameter estimates and LD from simulations for a dominant model. Estimates of the additive and dominance parameters respectively (by column) at the hit SNP, plotted against the r2 between the causal and hit SNPs. The estimates are from the simulated replication sample from simulations with a dominant causal SNP with homozygous RRs of 1.22, 1.42, and 22, respectively (by row; corresponding to true parameter values of β = γ = 0.5 log(Hom. RR) = 0.18, 0.34, 0.69). Only simulations where the hit SNP passed the scan and replication criteria are displayed. The dashed red lines denote the true parameter values. The dashed black lines indicate a zero effect. The blue lines show linear regression fits to the points on each plot, to aid visual comparisons. LD, linkage disequilibrium; SNP, single nucleotide polymorphism; RR, relative risk.
LLM interpretation
This figure consists of a 3x2 grid of scatter plots showing parameter estimates (additive in the left column, dominance in the right column) plotted against $r^2$ (linkage disequilibrium) for three different relative risk (RR) conditions ($1.2^2$, $1.4^2$, and $2^2$). Each plot includes a blue linear regression fit, a red dashed line representing the true parameter value, and a black dashed line at zero. As $r^2$ increases toward 1.0, the estimated parameters generally converge toward the true parameter values (red lines).
A breakdown of the simulation results by outcome and LD, for simulations with a dominant model with a homozygous RR of 1.42. The LD is shown as the r2 between the causal and hit SNPs, split into bins of width 0.1 (labeled on the x-axis with the highest possible r2 value for each bin). The three possible outcomes are: the hit SNP does not pass the scan and replication criteria (“Undetected”); that it passes these criteria but a subsequent deviation test is not significant (“Detected without deviation”); or that this test is significant (“Detected with deviation”). For each LD bin: panel A shows the absolute counts of each outcome, panel B shows their relative proportions, while panel C shows the relative proportions of the last two outcomes only. Note that the two leftmost columns in panel C are based on very small counts and so the exact values plotted are not precise estimates of the relative proportions. LD, linkage disequilibrium; SNP, single nucleotide polymorphism; RR, relative risk.
LLM interpretation
This figure consists of three stacked bar charts (Panels A, B, and C) showing simulation results categorized by outcome ("Undetected," "Detected without deviation," and "Detected with deviation") across bins of linkage disequilibrium ($r^2$ between causal and hit SNPs). Panel A displays absolute frequencies, showing a high count of "Undetected" results at low $r^2$ and a high count of "Detected with deviation" results at $r^2 = 1$. Panels B and C show relative proportions, demonstrating that as $r^2$ increases, the proportion of "Undetected" outcomes decreases while the proportion of "Detected with deviation" outcomes increases.
No entities extracted from this document yet.
No uploaded files.
In this knowledge base
| Title | Year | PMID |
|---|---|---|
| The aggregate effect of dopamine genes on dependence symptoms among cocaine users: cross-validation of a candidate system scoring approach. | 2012 | 22358648 |
External
| Title | Authors | Journal | Year | Link |
|---|---|---|---|---|
| Genome- and transcriptome-wide association meta-analysis reveals new insights into genes affecting coronary and peripheral artery disease. | Rode M et al. | — | 2025 | → |
| Hidden structure in polygenic scores and the challenge of disentangling ancestry interactions in admixed populations. | Aw AJ et al. | — | 2025 | → |
| Plasmalogen remodeling modulates macrophage response to cytotoxic oxysterols and atherosclerotic plaque vulnerability. | Jalil A et al. | — | 2025 | → |
| Systems genetics of metabolic health in the BXD mouse genetic reference population. | Li X et al. | — | 2024 | → |
| Evaluating the Potential of Younger Cases and Older Controls Cohorts to Improve Discovery Power in Genome-Wide Association Studies of Late-Onset Diseases. | Oliynyk RT | — | 2019 | → |
| Insights into malaria susceptibility using genome-wide data on 17,000 individuals from Africa, Asia and Oceania. | Malaria Genomic Epidemiology Network | — | 2019 | → |
| Power and Sample Size Calculations for Genetic Association Studies in the Presence of Genetic Model Misspecification. | Moore CM et al. | — | 2019 | → |
| Genome wide association study to identify predictors for severe skin toxicity in colorectal cancer patients treated with cetuximab. | Baas J et al. | — | 2018 | → |
| Pathway-induced allelic spectra of diseases in the presence of strong genetic effects. | Kanoungi G et al. | — | 2018 | → |
| Genetic model misspecification in genetic association studies. | Gaye A et al. | — | 2017 | → |
| What has GWAS done for HLA and disease associations? | Kennedy AE et al. | — | 2017 | → |
| Whole-genome view of the consequences of a population bottleneck using 2926 genome sequences from Finland and United Kingdom. | Chheda H et al. | — | 2017 | → |
| Analysis of Genetic Association Studies Incorporating Prior Information of Genetic Models | Zheng G et al. | — | 2015 | — |
| Imputation of KIR Types from SNP Variation Data. | Vukcevic D et al. | — | 2015 | → |
| Nonadditive Effects of Genes in Human Metabolomics. | Tsepilov YA et al. | — | 2015 | → |
| A note on the efficiencies of sampling strategies in two-stage Bayesian regional fine mapping of a quantitative trait. | Chen Z et al. | — | 2014 | → |
| A novel test for recessive contributions to complex diseases implicates Bardet-Biedl syndrome gene BBS10 in idiopathic type 2 diabetes and obesity. | Lim ET et al. | — | 2014 | → |
| Estimation of epistatic variance components and heritability in founder populations and crosses. | Young AI et al. | — | 2014 | → |
| Testing for non-linear causal effects using a binary genotype in a Mendelian randomization study: application to alcohol and cardiovascular traits. | Silverwood RJ et al. | — | 2014 | → |
| Incorporating parental information into family-based association tests. | Yu Z et al. | — | 2013 | → |
| The future of genomics for developmentalists. | Plomin R et al. | — | 2013 | → |
| Estimating causal effects of genetic risk variants for breast cancer using marker data from bilateral and familial cases. | Dudbridge F et al. | — | 2012 | → |
| Including known covariates can reduce power to detect genetic effects in case-control studies. | Pirinen M et al. | — | 2012 | → |
| The aggregate effect of dopamine genes on dependence symptoms among cocaine users: cross-validation of a candidate system scoring approach. | Derringer J et al. | — | 2012 | → |