Disease model distortion in association studies.

paper Cited Public

Authors: Vukcevic, Damjan; Hechter, Eliana; Spencer, Chris; Donnelly, Peter
Year: 2011
Journal: Genetic epidemiology
PMID: 21416505
DOI: 10.1002/gepi.20576
PMCID: PMC3110308

Fig. 1

The effective additive parameter for three disease models, plotted against the RAF. A homozygous RR of 1.42 and an equal number of cases and controls were assumed for all disease models. The right-hand y-axis shows the per-allele RR corresponding to each value of β′ (i.e. RR′ = eβ′). Note that for the multiplicative model, β′ = β = log(1.4) for all RAFs. RAF, risk allele frequency; RR, relative risk.

LLM interpretation

This line graph plots the effective additive parameter ($\beta'$) and corresponding per-allele relative risk ($\text{RR}'$) against the risk allele frequency (RAF) for three disease models. The dominant model (black line) shows a steady decrease in $\beta'$ as RAF increases, while the recessive model (blue line) shows a steady increase. The multiplicative model (magenta line) remains constant across all RAF values.

Fig. 2

Impact of LD on disease model parameters for a dominant model. Parameter values as functions of r, for a selection of RAFs. A dominant model with a homozygous RR of 1.42 at the causal SNP is assumed, corresponding to general model parameter values of βA = γA = log(1.4) = 0.34. The solid black line shows the dominance parameter (γB), the dashed black line the additive parameter (βB), and the magenta line the effective additive parameter () at the marker SNP. The respective parameter values at the causal SNP are shown by points at r = 1, following the same color scheme as the lines (in this case, the points for βA and γA overlap since they have the same value). Plots in each row correspond to a given marker SNP RAF and columns to a given causal SNP RAF, as labeled. The range of possible values of r depends on the allele frequencies, as shown by Equation (3). Note that a negative value for β is equivalent to a positive value for it when considered with respect to the other allele at the SNP. RAF, risk allele frequency; LD, linkage disequilibrium; RR, relative risk; SNP, single nucleotide polymorphism.

LLM interpretation

This figure consists of a 2x2 grid of line plots showing how disease model parameters vary as a function of linkage disequilibrium ($r$) for different risk allele frequencies (RAF) of causal and marker SNPs. The x-axis represents $r$ (LD) and the y-axis represents the parameter value, with lines depicting the dominance parameter ($\gamma_B$, solid black), additive parameter ($\beta_B$, dashed black), and effective additive parameter (magenta). Points at $r=1$ indicate the parameter values at the causal SNP, and the plots demonstrate that the relationship between these parameters and LD depends on whether the SNPs are rare (RAF = 0.1) or common (RAF = 0.5).

Fig. 3

Impact of LD on disease model parameters for a recessive model. Same as Figure 2, but now for a recessive model with a homozygous RR of 1.42, corresponding to general model parameter values of βA = −γA = log(1.4) = 0.34. LD, linkage disequilibrium; RR, relative risk.

LLM interpretation

This figure consists of a four-panel grid of line plots showing the relationship between linkage disequilibrium (r LD) on the x-axis and parameter value on the y-axis for a recessive disease model. The panels compare different allele frequencies (RAF) for the causal SNP (0.1 rare vs. 0.5 common) and the marker SNP (0.1 rare vs. 0.5 common). Each panel displays multiple curves (solid black, dashed black, and solid magenta) illustrating how the parameter value changes as LD varies from -1.0 to 1.0.

Fig. 4

Model space plot showing distortion toward a multiplicative model. The two disease parameters (dominance vs. additive; γ vs. β) plotted against each other showing the full space of models up to the value of the baseline parameter (µ). The horizontal gray line shows the subspace of multiplicative models. The gray lines above the horizontal show the subspace of dominant models, and those below show the subspace of recessive models. Curves and points trace out the models for the scenarios shown in Figures 2 and 3, lying above and below the horizontal line, respectively. Curves are drawn in different styles to show the causal and marker SNP RAFs they correspond to, as shown by the two legends. The two points represent the true disease models at the causal SNP. SNP, single nucleotide polymorphism; RAF, risk allele frequency.

LLM interpretation

This is a model space plot with the disease parameter $\beta$ on the x-axis and $\gamma$ on the y-axis. The plot features a horizontal gray line representing multiplicative models, with regions above and below indicating dominant and recessive subspaces, respectively. Several solid and dashed curves, color-coded by Causal SNP RAF (magenta for 0.1, black for 0.5) and styled by Marker SNP RAF, trace model trajectories across these subspaces. Two black points mark the true disease models at the causal SNP.

Fig. 5

Parameter estimates and LD from simulations for a dominant model. Estimates of the additive and dominance parameters respectively (by column) at the hit SNP, plotted against the r2 between the causal and hit SNPs. The estimates are from the simulated replication sample from simulations with a dominant causal SNP with homozygous RRs of 1.22, 1.42, and 22, respectively (by row; corresponding to true parameter values of β = γ = 0.5 log(Hom. RR) = 0.18, 0.34, 0.69). Only simulations where the hit SNP passed the scan and replication criteria are displayed. The dashed red lines denote the true parameter values. The dashed black lines indicate a zero effect. The blue lines show linear regression fits to the points on each plot, to aid visual comparisons. LD, linkage disequilibrium; SNP, single nucleotide polymorphism; RR, relative risk.

LLM interpretation

This figure consists of a 3x2 grid of scatter plots showing parameter estimates (additive in the left column, dominance in the right column) plotted against $r^2$ (linkage disequilibrium) for three different relative risk (RR) conditions ($1.2^2$, $1.4^2$, and $2^2$). Each plot includes a blue linear regression fit, a red dashed line representing the true parameter value, and a black dashed line at zero. As $r^2$ increases toward 1.0, the estimated parameters generally converge toward the true parameter values (red lines).

Fig. 6

A breakdown of the simulation results by outcome and LD, for simulations with a dominant model with a homozygous RR of 1.42. The LD is shown as the r2 between the causal and hit SNPs, split into bins of width 0.1 (labeled on the x-axis with the highest possible r2 value for each bin). The three possible outcomes are: the hit SNP does not pass the scan and replication criteria (“Undetected”); that it passes these criteria but a subsequent deviation test is not significant (“Detected without deviation”); or that this test is significant (“Detected with deviation”). For each LD bin: panel A shows the absolute counts of each outcome, panel B shows their relative proportions, while panel C shows the relative proportions of the last two outcomes only. Note that the two leftmost columns in panel C are based on very small counts and so the exact values plotted are not precise estimates of the relative proportions. LD, linkage disequilibrium; SNP, single nucleotide polymorphism; RR, relative risk.

LLM interpretation

This figure consists of three stacked bar charts (Panels A, B, and C) showing simulation results categorized by outcome ("Undetected," "Detected without deviation," and "Detected with deviation") across bins of linkage disequilibrium ($r^2$ between causal and hit SNPs). Panel A displays absolute frequencies, showing a high count of "Undetected" results at low $r^2$ and a high count of "Detected with deviation" results at $r^2 = 1$. Panels B and C show relative proportions, demonstrating that as $r^2$ increases, the proportion of "Undetected" outcomes decreases while the proportion of "Detected with deviation" outcomes increases.

#	Section	Preview
0	INTRODUCTION	Genome-wide association studies (GWAS) exploit the correlation structure in the genome, due to…
1	INTRODUCTION	In this paper, we study a particular aspect of this relationship both analytically and empirically.…
2	INTRODUCTION	The impact of imperfect LD has been well characterized for multiplicative models, both in terms of…
3	INTRODUCTION	We derive an analogous result for a scenario involving two interacting SNPs under a simple…
4	INTRODUCTION	The above results apply for any given, fixed, marker loci. To study the impact of distortion on…
5	INTRODUCTION	Previous studies have explored the impact of LD on GWAS. Most have done so empirically, and only for…
6	INTRODUCTION	While we focus on case-control studies, we note that some related work has been published for…
7	THEORETICAL DERIVATIONS — LD MODEL	Let A and B be a pair of biallelic SNPs and code the alleles at each as 0 and 1. In the situations…
8	THEORETICAL DERIVATIONS — LD MODEL	For brevity, we will refer to the haplotype with A = i and B = j as ij. Consider the population…
9	THEORETICAL DERIVATIONS — LD MODEL	Define the following conditional probabilities,
10	THEORETICAL DERIVATIONS — LD MODEL	(1)
11	THEORETICAL DERIVATIONS — LD MODEL	(2)
12	THEORETICAL DERIVATIONS — LD MODEL	These allow the following representation of the haplotype distribution,
13	THEORETICAL DERIVATIONS — LD MODEL	and give the identity,
14	THEORETICAL DERIVATIONS — LD MODEL	The correlation coefficient can be expressed in terms of these quantities and can be shown to be,
15	THEORETICAL DERIVATIONS — LD MODEL	By solving these last two equations for q0 and q1, we can see that the haplotype distribution is…
16	THEORETICAL DERIVATIONS — LD MODEL	As is well known, the range of r depends on the allele frequencies. Suppose, without loss of…
17	THEORETICAL DERIVATIONS — LD MODEL	(3)
18	THEORETICAL DERIVATIONS — LD MODEL	The roles of fA and fB swap if fA≥fB. From this we can see that in order for a high positive…
19	THEORETICAL DERIVATIONS — LD MODEL	We use the term diplotype to mean a pair of two-SNP haplotypes belonging to an individual. Let…

Citation	PMID	DOI	Status
Ahn, K et al., Ann Hum Genet, 2007, The effects of SNP genotyping errors on the power of the Cochran-Armitage linear trend test for case/control association studies	17096677	10.1111/j.1469-1809.2006.00318.x	Cited
Armitage, P, Biometrics, 1955, Tests for linear trends in proportions and frequencies	—	—	—
Balding, DJ, Nat Rev Genet, 2006, A tutorial on statistical methods for population association studies	16983374	10.1038/nrg1916	Cited
Bhangale, TR et al., Nat Genet, 2008, Estimating coverage and power for genetic association studies using near-complete variation data	18568023	10.1038/ng.180	Cited
Cantor, RM et al., Am J Hum Genet, 2010, Prioritizing GWAS results: A review of statistical methods and recommendations for their application	20074509	10.1016/j.ajhg.2009.11.017	Cited
Chapman, JM et al., Hum Hered, 2003, Detecting disease associations due to linkage disequilibrium using haplotype tags: a class of tests and the determinants of statistical power	14614235	10.1159/000073729	Cited
Cox, DR et al., Theoretical Statistics, 1974	—	—	—
Hill, WG et al., PLoS Genet, 2008, Data and theory point to mainly additive genetic variance for complex traits	18454194	10.1371/journal.pgen.1000008	Cited
Hindorff, L et al., 2010	—	—	—
Iles, MM, PLoS Genet, 2008, What can genome-wide association studies tell us about the genetics of common disease?	18454206	10.1371/journal.pgen.0040033	Cited
Kass, RE et al., J R Stat Soc Ser B, 1992, Approximate Bayes factors and orthogonal parameters, with application to testing equality of two binomial proportions	—	—	—
Marchini, J et al., Nat Genet, 2005, Genome-wide strategies for detecting multiple loci that influence complex diseases	15793588	10.1038/ng1537	Cited
Nature, 2005, A haplotype map of the human genome	16255080	10.1038/nature04226	Cited
Nature, 2007, A second generation human haplotype map of over 3.1 million SNPs	17943122	10.1038/nature06258	Cited
Nature, 2007, Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls	17554300	10.1038/nature05911	Cited
Pritchard, JK et al., Am J Hum Genet, 2001, Linkage disequilibrium in humans: models and data	11410837	10.1086/321275	Cited
R: A Language and Environment for Statistical Computing, 2009	—	—	—
Sasieni, PD, Biometrics, 1997, From genotypes to genes: doubling the sample size	9423247	—	Cited
Schouten, EG et al., Stat Med, 1993, Risk ratio and rate ratio estimation in case-cohort designs: hypertension and cardiovascular mortality	8248665	10.1002/sim.4780121808	Cited
Science, 2004, The ENCODE (ENCyclopedia Of DNA Elements) Project	15499007	10.1126/science.1105136	Cited
Sham, PC et al., Am J Hum Genet, 2000, Power of linkage versus association analysis of quantitative traits, by use of variance-components models, for sibship data	10762547	10.1086/302891	Cited
Spencer, CC et al., PLoS Genet, 2009, Designing genome-wide association studies: sample size, power, imputation, and the choice of genotyping chip	19492015	10.1371/journal.pgen.1000477	Cited
Stephens, M et al., Nat Rev Genet, 2009, Bayesian statistical methods for genetic association studies	19763151	10.1038/nrg2615	Cited
Zheng, G et al., Stat Sci, 2009, Robust tests in genome-wide scans under incomplete linkage disequilibrium	—	—	—
Zondervan, KT et al., Nat Rev Genet, 2004, The complex interplay among factors that influence allelic association	14735120	10.1038/nrg1270	Cited

In this knowledge base

Title	Year	PMID
The aggregate effect of dopamine genes on dependence symptoms among cocaine users: cross-validation of a candidate system scoring approach.	2012	22358648

External

Title	Authors	Journal	Year	Link
Genome- and transcriptome-wide association meta-analysis reveals new insights into genes affecting coronary and peripheral artery disease.	Rode M et al.	—	2025	→
Hidden structure in polygenic scores and the challenge of disentangling ancestry interactions in admixed populations.	Aw AJ et al.	—	2025	→
Plasmalogen remodeling modulates macrophage response to cytotoxic oxysterols and atherosclerotic plaque vulnerability.	Jalil A et al.	—	2025	→
Systems genetics of metabolic health in the BXD mouse genetic reference population.	Li X et al.	—	2024	→
Evaluating the Potential of Younger Cases and Older Controls Cohorts to Improve Discovery Power in Genome-Wide Association Studies of Late-Onset Diseases.	Oliynyk RT	—	2019	→
Insights into malaria susceptibility using genome-wide data on 17,000 individuals from Africa, Asia and Oceania.	Malaria Genomic Epidemiology Network	—	2019	→
Power and Sample Size Calculations for Genetic Association Studies in the Presence of Genetic Model Misspecification.	Moore CM et al.	—	2019	→
Genome wide association study to identify predictors for severe skin toxicity in colorectal cancer patients treated with cetuximab.	Baas J et al.	—	2018	→
Pathway-induced allelic spectra of diseases in the presence of strong genetic effects.	Kanoungi G et al.	—	2018	→
Genetic model misspecification in genetic association studies.	Gaye A et al.	—	2017	→
What has GWAS done for HLA and disease associations?	Kennedy AE et al.	—	2017	→
Whole-genome view of the consequences of a population bottleneck using 2926 genome sequences from Finland and United Kingdom.	Chheda H et al.	—	2017	→
Analysis of Genetic Association Studies Incorporating Prior Information of Genetic Models	Zheng G et al.	—	2015	—
Imputation of KIR Types from SNP Variation Data.	Vukcevic D et al.	—	2015	→
Nonadditive Effects of Genes in Human Metabolomics.	Tsepilov YA et al.	—	2015	→
A note on the efficiencies of sampling strategies in two-stage Bayesian regional fine mapping of a quantitative trait.	Chen Z et al.	—	2014	→
A novel test for recessive contributions to complex diseases implicates Bardet-Biedl syndrome gene BBS10 in idiopathic type 2 diabetes and obesity.	Lim ET et al.	—	2014	→
Estimation of epistatic variance components and heritability in founder populations and crosses.	Young AI et al.	—	2014	→
Testing for non-linear causal effects using a binary genotype in a Mendelian randomization study: application to alcohol and cardiovascular traits.	Silverwood RJ et al.	—	2014	→
Incorporating parental information into family-based association tests.	Yu Z et al.	—	2013	→
The future of genomics for developmentalists.	Plomin R et al.	—	2013	→
Estimating causal effects of genetic risk variants for breast cancer using marker data from bilateral and familial cases.	Dudbridge F et al.	—	2012	→
Including known covariates can reduce power to detect genetic effects in case-control studies.	Pirinen M et al.	—	2012	→
The aggregate effect of dopamine genes on dependence symptoms among cocaine users: cross-validation of a candidate system scoring approach.	Derringer J et al.	—	2012	→