Optimizing the power of genome-wide association studies by using publicly available reference samples to expand the control group.

paper Cited Public

Authors: Zhuang, Joanna J; Zondervan, Krina; Nyberg, Fredrik; Harbron, Chris; Jawaid, Ansar; Cardon, Lon R; Barratt, Bryan J; Morris, Andrew P
Year: 2010
Journal: Genetic epidemiology
PMID: 20088020
DOI: 10.1002/gepi.20482
PMCID: PMC2962805

Fig. 1

Power of a GWA study of 500 cases to detect association of a causal variant with allele frequency 20% for a range of heterozygous genotype relative risks under a multiplicative model with disease prevalence of 0.1%. Results are presented for a trend test of association for a significance level of 5%, with the number of control samples ranging from 500 to 5,000 individuals.

Fig. 2

Power of three trend tests of association at a 5% significance level for a high-risk allele frequency of 20% as a function of the allelic odds ratio in the absence of population structure (FST = 0): T_CC, cases against controls from the source population, without correction for population structure; T_F, cases against control cohort expanded by external samples, without correction for population structure; T_Fmds, cases against control cohort expanded by external samples, corrected for up to three axes of genetic variation determined through MDS. Power is estimated over 5,000 replicates of 100 cases, 100 controls, and 100 samples from each of three external cohorts.

Fig. 3

Power of three trend tests of association at a 5% significance level for a high-risk allele frequency of 20%, as a function of the allelic odds ratio in the presence of population structure (FST = 0.01): T_CC, cases against controls from the source population, without correction for population structure; T_F, cases against control cohort expanded by external samples, without correction for population structure; T_Fmds, cases against control cohort expanded by external samples, corrected for up to three axes of genetic variation determined through MDS. Power is estimated over 5,000 replicates of 100 cases, 100 controls, and 100 samples from each of three external cohorts.

#	Section	Preview
0	INTRODUCTION	Identifying genetic variants that influence common complex diseases can provide valuable insights…
1	INTRODUCTION	common variants with allelic odds ratios of the order of 1.5–1.7 [The Wellcome Trust Case Control…
2	INTRODUCTION	Statistically, this problem can be regarded as ‘‘weak power,’’ for which the primary…
3	INTRODUCTION	prevalence 0.1%, with the number of control samples ranging from 500 to 5,000 individuals at a…
4	INTRODUCTION	Our thoughts on control sample augmentation were initially motivated by the emergence of multisample…
5	INTRODUCTION	T1D and RA, making use of cases of one disease as controls for the other can reduce power to detect…
6	INTRODUCTION	The WTCCC present “expanded reference group analyses,” combining controls with all additional…
7	INTRODUCTION	The challenge we address here is that of expanding the control group to include genotyped…
8	INTRODUCTION	et al., 1978]. The EIGENSTRAT method makes use of axes of genetic variation, estimated from…
9	INTRODUCTION	Here, we make use of a related statistical technique to adjust for population structure with an…
10	INTRODUCTION	effects of underlying population structure. We present a simulation study to investigate the…
11	METHODS	Consider a population-based sample of cases and controls, which we expect to be ascertained from a…
12	METHODS	Assuming a multiplicative model of disease risk, we denote the genotype of the ith individual at the…
13	METHODS	for the ith and jth samples, where Nij is the total number of SNPs with genotype data available for…
14	METHODS	We begin by testing for association between disease and the first axis of genetic variation. In a…
15	METHODS	where γ1 denotes the effect of the first axis. We perform a likelihood ratio test of association,…
16	METHODS	Next, we test for association of each SNP, k, in turn with disease, adjusting for the effects of…
17	METHODS	where βk is the allelic log-odds ratio of the minor allele at the kth SNP, and zt is defined above.…
18	SIMULATION STUDY	We carry out simulations to assess the false-positive error rate and power of the test procedure…
19	SIMULATION STUDY	For each individual, we simulate genotype data at 10,000 uncorrelated SNPs not associated with…

Citation	PMID	DOI	Status
Balding, DJ et al., Genetica, 1995, A method for quantifying differentiation between populations at multi-allelic loci and its implications for investigating identity and paternity	7607457	10.1007/BF01441146	Cited
Barrett, JC et al., Nat Genet, 2006, Evaluating coverage of genome-wide association studies	16715099	10.1038/ng1801	Cited
Cavalli-Sforza, LL et al., Science, 1993, Demic expansions and human evolution	8430313	10.1126/science.8430313	Cited
Cox, TF et al., Multidimensional Scaling, 1994	—	—	—
Devlin, B et al., Biometrics, 1999, Genomic control for association studies	11315092	10.1111/j.0006-341x.1999.00997.x	Cited
Devlin, B et al., Genet Epidemiol, 2001, Unbiased methods for population-based association studies	11754464	10.1002/gepi.1034	Cited
Frayling, TM et al., Science, 2007, A common variant in the FTO gene is associated with body mass index and predisposes to childhood and adult obesity	17434869	10.1126/science.1141634	Cited
Marchini, J et al., Nat Genet, 2007, A new multipoint method for genome-wide association studies via imputation of genotypes	17572673	10.1038/ng2088	Cited
Menozzi, P et al., Science, 1978, Synthetic maps of human gene frequencies in Europeans	356262	10.1126/science.356262	Cited
Miclaus, K et al., Genet Epidemiol, 2009, SNP selection and multidimensional scaling to quantify population structure	19194989	10.1002/gepi.20401	Cited
Nature, 2005, A haplotype map of the human genome	16255080	10.1038/nature04226	Cited
Nature, 2007, Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls	17554300	10.1038/nature05911	Cited
Parkes, M et al., Nat Genet, 2007, Sequence variants in the autophagy gene IRGM and multiple other replicating loci contribute to Crohn's disease susceptibility	17554261	10.1038/ng2061	Cited
Patterson, N et al., PLOS Genet, 2006, Population structure and eigenanalysis	17194218	10.1371/journal.pgen.0020190	Cited
Price, AL et al., Nat Genet, 2006, Principal components analysis corrects for stratification in genome-wide association studies	16862161	10.1038/ng1847	Cited
Pritchard, JK et al., Genetics, 2000, Inference of population structure using multilocus genotype data	10835412	10.1093/genetics/155.2.945	Cited
Samani, NJ et al., N Engl J Med, 2007, Genomewide association analysis of coronary artery disease	17634449	10.1056/NEJMoa072366	Cited
Todd, JA et al., Nat Genet, 2007, Robust associations of four new chromosome regions from genome-wide analyses of type 1 diabetes	17554260	10.1038/ng2068	Cited
Weir, BS, Genetic Data Analysis II US, 1996	—	—	—
Zeggini, E et al., Science, 2007, Replication of genome-wide association signals in UK samples reveals risk loci for type 2 diabetes	17463249	10.1126/science.1142364	Cited

Title	Year	PMID
GAWMerge expands GWAS sample size and diversity by combining array-based genotyping and whole-genome sequencing.	2022	35953715
KAT2B polymorphism identified for drug abuse in African Americans with regulatory links to drug abuse pathways in human prefrontal cortex.	2016	26202629
Imputation across genotyping arrays for genome-wide association studies: assessment of bias and a correction strategy.	2013	23334152
Artifact due to differential error when cases and controls are imputed from different platforms.	2012	21735171
Using public control genotype data to increase power and decrease cost of case-control genetic association studies.	2010	20821337

Title	Authors	Journal	Year	Link
Best practices for analyzing imputed genotypes from low-pass sequencing in dogs.	Buckley RM et al.	—	2022	→
GAWMerge expands GWAS sample size and diversity by combining array-based genotyping and whole-genome sequencing.	Mathur R et al.	—	2022	→
Best practices for analyzing imputed genotypes from low-pass sequencing in dogs	Buckley RM et al.	—	2021	—
KAT2B polymorphism identified for drug abuse in African Americans with regulatory links to drug abuse pathways in human prefrontal cortex.	Johnson EO et al.	—	2016	→
Strategies to improve the performance of rare variant association studies by optimizing the selection of controls.	Zhu N et al.	—	2015	→
Imputation across genotyping arrays for genome-wide association studies: assessment of bias and a correction strategy.	Johnson EO et al.	—	2013	→
Artifact due to differential error when cases and controls are imputed from different platforms.	Sinnott JA et al.	—	2012	→
Association mapping.	Painter JN et al.	—	2011	→
Confounded by sequencing depth in association studies of rare alleles.	Garner C	—	2011	→
Genome-wide association study of Stevens-Johnson Syndrome and Toxic Epidermal Necrolysis in Europe.	Génin E et al.	—	2011	→
Including additional controls from public databases improves the power of a genome-wide association study.	Mukherjee S et al.	—	2011	→
Transethnic meta-analysis of genomewide association studies.	Morris AP	—	2011	→
Clustering by genetic ancestry using genome-wide SNP data.	Solovieff N et al.	—	2010	→
Using public control genotype data to increase power and decrease cost of case-control genetic association studies.	Ho LA et al.	—	2010	→

Optimizing the power of genome-wide association studies by using publicly available reference samples to expand the control group.

In this knowledge base

External