Optimizing the power of genome-wide association studies by using publicly available reference samples to expand the control group.
- Authors
- Zhuang, Joanna J; Zondervan, Krina; Nyberg, Fredrik; Harbron, Chris; Jawaid, Ansar; Cardon, Lon R; Barratt, Bryan J; Morris, Andrew P
- Year
- 2010
- Journal
- Genetic epidemiology
- PMID
- 20088020
- DOI
- 10.1002/gepi.20482
- PMCID
- PMC2962805
Genome-wide association (GWA) studies have proved extremely successful in identifying novel genetic loci contributing effects to complex human diseases. In doing so, they have highlighted the fact that many potential loci of modest effect remain undetected, partly due to the need for samples consisting of many thousands of individuals. Large-scale international initiatives, such as the Wellcome Trust Case Control Consortium, the Genetic Association Information Network, and the database of genetic and phenotypic information, aim to facilitate discovery of modest-effect genes by making genome-wide data publicly available, allowing information to be combined for the purpose of pooled analysis. In principle, disease or control samples from these studies could be used to increase the power of any GWA study via judicious use as "genetically matched controls" for other traits. Here, we present the biological motivation for the problem and the theoretical potential for expanding the control group with publicly available disease or reference samples. We demonstrate that a naïve application of this strategy can greatly inflate the false-positive error rate in the presence of population structure. As a remedy, we make use of genome-wide data and model selection techniques to identify "axes" of genetic variation which are associated with disease. These axes are then included as covariates in association analysis to correct for population structure, which can result in increases in power over standard analysis of genetic information from the samples in the original GWA study.
Power of a GWA study of 500 cases to detect association of a causal variant with allele frequency 20% for a range of heterozygous genotype relative risks under a multiplicative model with disease prevalence of 0.1%. Results are presented for a trend test of association for a significance level of 5%, with the number of control samples ranging from 500 to 5,000 individuals.
Power of three trend tests of association at a 5% significance level for a high-risk allele frequency of 20% as a function of the allelic odds ratio in the absence of population structure (FST = 0): T_CC, cases against controls from the source population, without correction for population structure; T_F, cases against control cohort expanded by external samples, without correction for population structure; T_Fmds, cases against control cohort expanded by external samples, corrected for up to three axes of genetic variation determined through MDS. Power is estimated over 5,000 replicates of 100 cases, 100 controls, and 100 samples from each of three external cohorts.
Power of three trend tests of association at a 5% significance level for a high-risk allele frequency of 20%, as a function of the allelic odds ratio in the presence of population structure (FST = 0.01): T_CC, cases against controls from the source population, without correction for population structure; T_F, cases against control cohort expanded by external samples, without correction for population structure; T_Fmds, cases against control cohort expanded by external samples, corrected for up to three axes of genetic variation determined through MDS. Power is estimated over 5,000 replicates of 100 cases, 100 controls, and 100 samples from each of three external cohorts.
No entities extracted from this document yet.
No uploaded files.
In this knowledge base
External
| Title | Authors | Journal | Year | Link |
|---|---|---|---|---|
| Best practices for analyzing imputed genotypes from low-pass sequencing in dogs. | Buckley RM et al. | — | 2022 | → |
| GAWMerge expands GWAS sample size and diversity by combining array-based genotyping and whole-genome sequencing. | Mathur R et al. | — | 2022 | → |
| Best practices for analyzing imputed genotypes from low-pass sequencing in dogs | Buckley RM et al. | — | 2021 | — |
| KAT2B polymorphism identified for drug abuse in African Americans with regulatory links to drug abuse pathways in human prefrontal cortex. | Johnson EO et al. | — | 2016 | → |
| Strategies to improve the performance of rare variant association studies by optimizing the selection of controls. | Zhu N et al. | — | 2015 | → |
| Imputation across genotyping arrays for genome-wide association studies: assessment of bias and a correction strategy. | Johnson EO et al. | — | 2013 | → |
| Artifact due to differential error when cases and controls are imputed from different platforms. | Sinnott JA et al. | — | 2012 | → |
| Association mapping. | Painter JN et al. | — | 2011 | → |
| Confounded by sequencing depth in association studies of rare alleles. | Garner C | — | 2011 | → |
| Genome-wide association study of Stevens-Johnson Syndrome and Toxic Epidermal Necrolysis in Europe. | Génin E et al. | — | 2011 | → |
| Including additional controls from public databases improves the power of a genome-wide association study. | Mukherjee S et al. | — | 2011 | → |
| Transethnic meta-analysis of genomewide association studies. | Morris AP | — | 2011 | → |
| Clustering by genetic ancestry using genome-wide SNP data. | Solovieff N et al. | — | 2010 | → |
| Using public control genotype data to increase power and decrease cost of case-control genetic association studies. | Ho LA et al. | — | 2010 | → |