Effective sample size: Quick estimation of the effect of related samples in genetic case-control association analyses.
- Authors
- Yang, Yaning; Remmers, Elaine F; Ogunwole, Chukwuma B; Kastner, Daniel L; Gregersen, Peter K; Li, Wentian
- Year
- 2011
- Journal
- Computational biology and chemistry
- PMID
- 21333602
- DOI
- 10.1016/j.compbiolchem.2010.12.006
- PMCID
- PMC3119257
Affected relatives are essential for pedigree linkage analysis, however, they cause a violation of the independent sample assumption in case-control association studies. To avoid the correlation between samples, a common practice is to take only one affected sample per pedigree in association analysis. Although several methods exist in handling correlated samples, they are still not widely used in part because these are not easily implemented, or because they are not widely known. We advocate the effective sample size method as a simple and accessible approach for case-control association analysis with correlated samples. This method modifies the chi-square test statistic, p-value, and 95% confidence interval of the odds-ratio by replacing the apparent number of allele or genotype counts with the effective ones in the standard formula, without the need for specialized computer programs. We present a simple formula for calculating effective sample size for many types of relative pairs and relative sets. For allele frequency estimation, the effective sample size method captures the variance inflation exactly. For genotype frequency, simulations showed that effective sample size provides a satisfactory approximation. A gene which is previously identified as a type 1 diabetes susceptibility locus, the interferon-induced helicase gene (IFIH1), is shown to be significantly associated with rheumatoid arthritis when the effective sample size method is applied. This significant association is not established if only one affected sib per pedigree were used in the association analysis. Relationship between the effective sample size method and other methods - the generalized estimation equation, variance of eigenvalues for correlation matrices, and genomic controls - are discussed.
Illustration of three situations concerning sample correlations: (A) samples are independent; (B) all samples are correlated with each other to form one cluster; (C) samples within a cluster are correlated, whereas there is no correlation between clusters. This is called “cluster-correlated data” in (Williams, 2000).
(Upper row) expected variance of genotypes AA, AB, BB and allele A (multiplied by the sample size) as a function of the allele frequency p(A). The solid line indicates the result from independent samples, and dashed line from sibpairs. (Lower row) effective genotype count reduction α1, α2, α3 for sibpair data as a function of p(A) (Eq.(12)). For allele count, the sample size reduction is a constant number of 2/3. The grey line is the αa(p), the weighted average of α1, α2, α3. The α=0.7096 line is the average of αa(p) over p’s.
Empirical power curve for the genotype-based test of three different models (recessive, additive, dominant) at the nominal significance level of 0.01 (upper row) and 0.05 (lower row). The x-axis is the log-odds ratio parameter b in the disease model Eq.(14). Two power curves are shown: using effective sample size corrected Xe2 (solid line), and by the score test (dashed line).
| # | Section | Preview |
|---|---|---|
| 20 | Mathematical Details — Correcting X2 test statistic and 95% confidence interval of odds-ratio by the effective sample size | The modified test statistic Xe2 can then be used to determined the p-value. |
| 21 | Mathematical Details — Correcting X2 test statistic and 95% confidence interval of odds-ratio by the effective sample size | For OR θ̂ = NA,caseNB,con/(NA,conNB,case), the uncorrected 95% confidence interval (CI) is… |
| 22 | Mathematical Details — Correcting X2 test statistic and 95% confidence interval of odds-ratio by the effective sample size | It can be shown that α<Xe2/X2<1 and σ̂e/σ̂ > 1, when α < 1. In other words, when the effective… |
| 23 | Results — Diminishing return in adding more relatives from the same pedigree in an association study | The kinship coefficients and sample size reduction with respect to allele frequency estimation of… |
| 24 | Results — Diminishing return in adding more relatives from the same pedigree in an association study | These results show that while one should include as many samples as possible, whether correlated or… |
| 25 | Results — Diminishing return in adding more relatives from the same pedigree in an association study | When a mixture of relatives from the same pedigree is included, one can use the averaged correlation… |
| 26 | Results — Improving p-value by using all samples | For the PTPN22 data in Table 4, if one affected sib per sibpair is selected for association as in… |
| 27 | Results — Improving p-value by using all samples | The ratio of two chi-squares, one for all samples with ESS correction and another without, is… |
| 28 | Results — Improving p-value by using all samples | For the IFIH1 gene in Table 5, we applied the effective sample size method both globally or… |
| 29 | Results — Improving p-value by using all samples | The SNP minor allele frequency (MAF) for the control population in the IFIH1 gene was reported to be… |
| 30 | Results — Improving p-value by using all samples | One can also apply the ESS method to each pedigree-type specifically. We count the T and C alleles… |
| 31 | Results — A single effective sample size does not capture all variance inflations in genotype frequency estimations, but it provides a good approximation | With the correlation coefficient for genotype indicator variable in Eq.(8,9), we can derive the… |
| 32 | Results — A single effective sample size does not capture all variance inflations in genotype frequency estimations, but it provides a good approximation | We illustrate these properties by the example of sibpairs. Using Eq.(8,9,3), the genotype-specific… |
| 33 | Results — A single effective sample size does not capture all variance inflations in genotype frequency estimations, but it provides a good approximation | Figure 2 shows αG,sibpair’s of the three genotypes as a function of p; also shown are the… |
| 34 | Results — A single effective sample size does not capture all variance inflations in genotype frequency estimations, but it provides a good approximation | The genotype-specific sample size reductions in Eq.(12) can be applied in the following way: (1) the… |
| 35 | Results — Effective sample size method performs well in simulation and in comparing the score test | Using the simulated data described in the Methods and Material section, we have checked the validity… |
| 36 | Results — Effective sample size method performs well in simulation and in comparing the score test | The locally most powerful test among all tests with the correct type I errors is the score test (Cox… |
| 37 | Discussion — Cheverud’s formula for the number of independent variables | Based on the idea that the overall amount of correlation among several variables can be measured by… |
| 38 | Discussion — Cheverud’s formula for the number of independent variables | We consider the large sibship situation where the correlation matrix is characterized by Eq.(4). It… |
| 39 | Discussion — Cheverud’s formula for the number of independent variables | We believe that our effective sample size formula makes better sense: in the three-sib sibship case,… |
No entities extracted from this document yet.
No uploaded files.
In this knowledge base
| Title | Year | PMID |
|---|---|---|
| Genetic and neurophysiological correlates of the age of onset of alcohol use disorders in adolescents and young adults. | 2013 | 23963516 |
External
| Title | Authors | Journal | Year | Link |
|---|---|---|---|---|
| Detection of <i>MC1R</i> Genetic Variants and Their Association with Coat Color in Asian Goats. | Kawaguchi F et al. | — | 2025 | → |
| GhostKnockoff inference empowers identification of putative causal variants in genome-wide association studies. | He Z et al. | — | 2022 | → |
| Beyond standard pipeline and p < 0.05 in pathway enrichment analyses. | Li W et al. | — | 2021 | → |
| Analysis of Association Between Dietary Intake and Red Blood Cell Count Results in Remission Ulcerative Colitis Individuals. | Głąbska D et al. | — | 2019 | → |
| Association study in African-admixed populations across the Americas recapitulates asthma risk loci in non-African populations. | Daya M et al. | — | 2019 | → |
| Learning Bayesian Networks from Correlated Data. | Bae H et al. | — | 2016 | → |
| On the Question of Effective Sample Size in Network Modeling: An Asymptotic Inquiry. | Kolaczyk ED et al. | — | 2015 | → |
| Role of Established Type 2 Diabetes-Susceptibility Genetic Variants in a High Prevalence American Indian Population. | Hanson RL et al. | — | 2015 | → |
| Characteristics of canonical intrinsic connectivity networks across tasks and monozygotic twin pairs. | Moodie CA et al. | — | 2014 | → |
| Genetic and neurophysiological correlates of the age of onset of alcohol use disorders in adolescents and young adults. | Chorlian DB et al. | — | 2013 | → |
| Transferability and fine mapping of type 2 diabetes loci in African Americans: the Candidate Gene Association Resource Plus Study. | Ng MC et al. | — | 2013 | → |
| Analysis of family- and population-based samples in cohort genome-wide association studies. | Manichaikul A et al. | — | 2012 | → |
| The TNF-α -308 Promoter Gene Polymorphism and Chronic HBV Infection. | Tayebi S et al. | — | 2012 | → |
| A comparison of founder-only and all-pedigree-members genotype-expression association by regression analysis. | Suh YJ et al. | — | 2007 | → |