Effective sample size: Quick estimation of the effect of related samples in genetic case-control association analyses.

paper Cited Public

Authors: Yang, Yaning; Remmers, Elaine F; Ogunwole, Chukwuma B; Kastner, Daniel L; Gregersen, Peter K; Li, Wentian
Year: 2011
Journal: Computational biology and chemistry
PMID: 21333602
DOI: 10.1016/j.compbiolchem.2010.12.006
PMCID: PMC3119257

#	Section	Preview
20	Mathematical Details — Correcting X2 test statistic and 95% confidence interval of odds-ratio by the effective sample size	The modified test statistic Xe2 can then be used to determined the p-value.
21	Mathematical Details — Correcting X2 test statistic and 95% confidence interval of odds-ratio by the effective sample size	For OR θ̂ = NA,caseNB,con/(NA,conNB,case), the uncorrected 95% confidence interval (CI) is…
22	Mathematical Details — Correcting X2 test statistic and 95% confidence interval of odds-ratio by the effective sample size	It can be shown that α<Xe2/X2<1 and σ̂e/σ̂ > 1, when α < 1. In other words, when the effective…
23	Results — Diminishing return in adding more relatives from the same pedigree in an association study	The kinship coefficients and sample size reduction with respect to allele frequency estimation of…
24	Results — Diminishing return in adding more relatives from the same pedigree in an association study	These results show that while one should include as many samples as possible, whether correlated or…
25	Results — Diminishing return in adding more relatives from the same pedigree in an association study	When a mixture of relatives from the same pedigree is included, one can use the averaged correlation…
26	Results — Improving p-value by using all samples	For the PTPN22 data in Table 4, if one affected sib per sibpair is selected for association as in…
27	Results — Improving p-value by using all samples	The ratio of two chi-squares, one for all samples with ESS correction and another without, is…
28	Results — Improving p-value by using all samples	For the IFIH1 gene in Table 5, we applied the effective sample size method both globally or…
29	Results — Improving p-value by using all samples	The SNP minor allele frequency (MAF) for the control population in the IFIH1 gene was reported to be…
30	Results — Improving p-value by using all samples	One can also apply the ESS method to each pedigree-type specifically. We count the T and C alleles…
31	Results — A single effective sample size does not capture all variance inflations in genotype frequency estimations, but it provides a good approximation	With the correlation coefficient for genotype indicator variable in Eq.(8,9), we can derive the…
32	Results — A single effective sample size does not capture all variance inflations in genotype frequency estimations, but it provides a good approximation	We illustrate these properties by the example of sibpairs. Using Eq.(8,9,3), the genotype-specific…
33	Results — A single effective sample size does not capture all variance inflations in genotype frequency estimations, but it provides a good approximation	Figure 2 shows αG,sibpair’s of the three genotypes as a function of p; also shown are the…
34	Results — A single effective sample size does not capture all variance inflations in genotype frequency estimations, but it provides a good approximation	The genotype-specific sample size reductions in Eq.(12) can be applied in the following way: (1) the…
35	Results — Effective sample size method performs well in simulation and in comparing the score test	Using the simulated data described in the Methods and Material section, we have checked the validity…
36	Results — Effective sample size method performs well in simulation and in comparing the score test	The locally most powerful test among all tests with the correct type I errors is the score test (Cox…
37	Discussion — Cheverud’s formula for the number of independent variables	Based on the idea that the overall amount of correlation among several variables can be measured by…
38	Discussion — Cheverud’s formula for the number of independent variables	We consider the large sibship situation where the correlation matrix is characterized by Eq.(4). It…
39	Discussion — Cheverud’s formula for the number of independent variables	We believe that our effective sample size formula makes better sense: in the three-sib sibship case,…

Citation	PMID	DOI	Status
Allen-Brady, K et al., BMC Bioinformatics, 2005, PedGenie: an analysis approach for genetic association testing in extended pedigrees and genealogies of arbitrary size	16620382	10.1186/1471-2105-7-209	Cited
Astel, W et al., Statistical Science, 2009, Population structure and cryptic relatedness in genetic association studies	—	—	—
Bacanu, SA et al., American Journal of Human Genetics, 2000, The power of genomic control	10801388	10.1086/302929	Cited
Balding, DJ, Nature Reviews Genetics, 2006, A tutorial on statistical methods for population association studies	16983374	10.1038/nrg1916	Cited
Begovich, AB et al., American Journal of Human Genetics, 2004, A missense single-nucleotide polymorphism in a gene encoding a protein tyrosine phosphatase (PTPN22) is associated with rheumatoid arthritis	15208781	10.1086/422827	Cited
Biedermann, S et al., Scandinavian Journal of Statistics, 2006, Tests in a case-control design including relatives	—	—	—
Boehnke, M, American Journal of Human Genetics, 1991, Allele frequency estimation from data on relative	1985459	—	Cited
Bourgain, C et al., American Journal of Human Genetics, 2003, Novel case-control test in founder population identifies P-selectin as an atopy-susceptibility locus	12929084	10.1086/378208	Cited
Bourgain, C, BMC Genetics, 2005, Comparing strategies for association mapping in samples with related individuals	16451714	10.1186/1471-2156-6-S1-S98	Cited
Broman, KW, Genetic Epidemiology, 2001, Estimation of allele frequencies with data on sibships	11255240	10.1002/gepi.2	Cited
Browning, SR et al., Genetic Epidemiology, 2005, Case-control single-marker and haplotype association analysis of pedigree data	15578751	10.1002/gepi.20051	Cited
Cavalli-Sforza, LL et al., The Genetics of Human Population, 1971	—	—	—
Cheverud, JM, Heredity, 2001, A simple correction for multiple comparisons in interval mapping genome scans	11678987	10.1046/j.1365-2540.2001.00901.x	Cited
Choi, Y et al., Genetic Epidemiology, 2009, Case-control association testing in the presence of unknown relationships	19333967	10.1002/gepi.20418	Cited
Coram, M et al., Annals of Applied Statistics, 2007, Improving population-specific allele frequency estimates by adapting supplemental data: An empirical Bayes approach	21451739	10.1214/07-aoas121	Cited
Cox, DR et al., Theoretical Statistics, 1974	—	—	—
Dai, F et al., American Journal of Human Genetics, 2006, Ordered genotypes: an extended ITO method and a general formula for genetic covariance	16685653	10.1086/504045	Cited
Devlin, B et al., Biometrics, 1999, Genomic control for association studies	11315092	10.1111/j.0006-341x.1999.00997.x	Cited
Devlin, B et al., Nature Genetics, 2004, Genomic control to th extreme (correspondence)	15514657	10.1038/ng1104-1129	Cited
Devlin, B et al., Theoretical Population Biology, 2001, Genomic control, a new approach to genetic-based association studies	11855950	10.1006/tpbi.2001.1542	Cited
Epstein, MP et al., American Journal of Human Genetics, 2005, Genetic association analysis using data from triads and unrelated subjects	15712104	10.1086/429225	Cited
Gorroochurn, P et al., Genetic Epidemiology, 2006, Centralizing the non-central chi-square: a new method to correct for population stratification in genetic case-control association studies	16502404	10.1002/gepi.20143	Cited
Gray-McGuire, C et al., Human Genomics, 2009, Genetic association tests: A method for the joint analysis of family and case-control data	19951892	10.1186/1479-7364-4-1-2	Cited
Göring, HHH et al., American Journal of Human Genetics, 2000, Linkage analysis in the presence of errors IV: joint pseudomarker analysis of linkage and/or linkage disequilibrium on a mixture of pedigrees and singletons when the mode of inheritance cannot be accurately specified	10731466	10.1086/302845	Cited
Hanley, JA et al., American Journal of Epidemiology, 2003, Statistical analysis of correlated data using generalized estimating equations: an orientation	12578807	10.1093/aje/kwf215	Cited
Jawaheer, D et al., American Journal of Human Genetics, 2001, A genomewide screen in multiplex rheumatoid arthritis families suggests genetic overlap with other autoimmune diseases	11254450	10.1086/319518	Cited
Jawaheer, D et al., Arthritis & Rheumatism, 2003, Screening the genome for rheumatoid arthritis susceptibility genes: a replication study and combined analysis of 512 multicase families	12687532	10.1002/art.10989	Cited
Kish, L, Survey Sampling, 1965	—	—	—
Klei, L et al., Human Genetics, 2007, Testing for association based on excess allele sharing in a sample of related cases and controls	17342507	10.1007/s00439-007-0345-z	Cited
Knight, S et al., BMC Proceedings, 2009, Pedigree association: assigning individual weights to pedigree members for genetic association analysis	20017987	10.1186/1753-6561-3-s7-s121	Cited
Köhler, K et al., Annals of Human Genetics, 2006, Case-control association tests correcting for population stratification	16441260	10.1111/j.1529-8817.2005.00214.x	Cited
Köhler, K et al., BMC Proceedings, 2007, Case-control studies with affected sibships	18466526	10.1186/1753-6561-1-s1-s29	Cited
Lange, K, Mathematical and Statistical Methods for Genetic Analysis, 1997	—	—	—
Lee, AT et al., Gene and Immunity, 2005, The PTPN22 R620W polymorphism associates with RF positive rheumatoid arthritis in a dose-dependent manner but not with HLA-SE status	15674368	10.1038/sj.gene.6364159	Cited
Lewis, CM, Briefings in Bioinformatics, 2002, Genetic association studies: design, analysis and interpretation	12139434	10.1093/bib/3.2.146	Cited
Li, CC et al., Biometrics, 1954, The derivation of joint distribution and correlation between relatives by the use of stochastic matrices	—	—	—
Li, M et al., American Journal of Human Genetics, 2005, Joint modeling of linkage and association: identifying SNPs responsible for a linkage signal	15877278	10.1086/430277	Cited
Li, W et al., Human Heredity, 2000, A complete enumeration and classification of two-locus disease models	10899752	10.1159/000022939	Cited
Li, W, American Journal of Human Genetics, 1998, A revised Li-Sacks formula for calculating the probability of identity-by-descent proportion	—	—	—
Li, W, Briefings in Bioinformatics, 2008, Three lectures on case-control genetic association analysis	18083722	10.1093/bib/bbm058	Cited
Li, Z et al., Human Heredity, 2000, Statistical properties of Teng and Risch’s sib-ship type tests for detecting an association between disease and a candidate allele	12145548	10.1159/000064974	Cited
Liang, KY et al., Biometrika, 1986, Longitudinal data analysis using generalized linear models	—	—	—
Madden, LV et al., Phytopathology, 1999, An effective sample size for predicting plant disease incidence in a spatial hierarchy	18944705	10.1094/PHYTO.1999.89.9.770	Cited
Malécot, G, Les Mathématique de l’Hérédité, 1948	—	—	—
Marchini, J et al., Nature Genetics, 2004, The effects of human population structure on large genetic association studies	15052271	10.1038/ng1337	Cited
Maruyama, T et al., Biometrics, 1970, Use of graph theory in computation of inbreeding and kinship coefficients	5475433	—	Cited
Moore, RM et al., BMC Genetics, 2005, Selecting cases from nuclear families for case-control association analysis	16451561	10.1186/1471-2156-6-S1-S105	Cited
Nagelkerke, NJD et al., European Journal of Human Genetics, 2004, Combining the transmission disequilibrium test and case-control methodology using generalized logistic regression	15340361	10.1038/sj.ejhg.5201255	Cited
Nyholt, DR, American Journal of Human Genetics, 2004, A simple correction for multiple testing for single-nucleotide polymorphisms in linkage disequilibrium with each other	14997420	10.1086/383251	Cited
Patterson, N et al., PLoS Genetics, 2006, Population structure and eigenanalysis	17194218	10.1371/journal.pgen.0020190	Cited
Price, AL et al., Nature Genetics, 2006, Principal components analysis corrects for stratification in genome-wide association studies	16862161	10.1038/ng1847	Cited
Rakovski, CS et al., PLoS ONE, 2009, A kinship-based modification of the Armitage trend test to address hidden population structure and small differential genotyping errors	19503792	10.1371/journal.pone.0005825	Cited
Rao, JNK et al., Biometrics, 1992, A simple method for the analysis of clustered binary data	1637980	—	Cited
Risch, N et al., Genome Research, 1998, The relative power of family-based and case-control designs for linkage disequilibrium studies of complex human diseases I. DNA pooling	9872982	10.1101/gr.8.12.1273	Cited
Risch, N et al., Science, 1996, The future of genetic studies of complex human diseases	8801636	10.1126/science.273.5281.1516	Cited
Rosner, B et al., Biometrics, 1988, Significance testing for correlated binary outcome data	3390508	—	Cited
Salyakina, D et al., Human Heredity, 2005, Evaluation of Nyholt’s procedure for multiple testing correction	16118503	10.1159/000087540	Cited
Sasieni, PD, Biometrics, 1997, From genotypes to genes: doubling the sample size	9423247	—	Cited
Sillanpää, MJ, Heredity, 2011, Overview of techniques to account for confounding due to population stratification and cryptic relatedness in genomic data association analyses	20628415	10.1038/hdy.2010.91	Cited
Silverberg, MS et al., Inflammatory Bowel Diseases, 2003, A population- and family-based study of Canadian families reveals association of HLA DRB1*0103 with colonic involvement in inflammatory bowel disease	12656131	10.1097/00054725-200301000-00001	Cited
Slager, SL et al., American Journal of Human Genetics, 2001, Evaluation of candidate genes in case-control studies: a statistical method to account for related subjects	11353403	10.1086/320608	Cited
Smyth, DJ et al., Nature Genetics, 2006, A genome-wide association study of nonsynonymous SNPs identifies a type 1 diabetes locus in the interferon-induced helicase (IFIH1) region	16699517	10.1038/ng1800	Cited
Teng, J et al., II. individual genotyping Genome Research, 1999, The relative power of family-based and case-control designs for linkage disequilibrium studies of Complex human diseases	10077529	—	Cited
Thiébaux, HJ et al., Journal of Applied Meteorology, 1984, The interpretation and estimation of effective sample size	—	—	—
Thomas, A et al., Bioinformatics, 2006, Maximum likelihood estimates of allele frequencies and error rates from samples of related individuals by gene counting	16410318	10.1093/bioinformatics/btk049	Cited
Thornton, T et al., American Journal of Human Genetics, 2010, ROADTRIPS: case-control association testing with partially or completely unknown population and pedigree structure	20137780	10.1016/j.ajhg.2010.01.001	Primary
Trégouët, DA et al., American Journal of Human Genetics, 1997, Testing association between candidate-gene markers and phenotype in related individuals, by use of estimating equations	9246000	10.1086/513895	Cited
Visscher, PM et al., European Journal of Human Genetics, 2008, Genome-wide association studies of quantitative traits with related individuals: little (power) lost but much to be gained	18183040	10.1038/sj.ejhg.5201990	Cited
Voight, BF et al., PLoS Genetics, 2005, Confounding from cryptic relatedness in case-control association studies	16151517	10.1371/journal.pgen.0010032	Cited
Weir, BS et al., Nature Reviews Genetics, 2006, Genetic relatedness analysis: modern data and new challenges	16983373	10.1038/nrg1960	Cited
Weir, BS, Genetic Analysis II, 1996	—	—	—
Williams, RL, Biometrics, 2000, A note on robust variance estimation for cluster-correlated data	10877330	10.1111/j.0006-341x.2000.00645.x	Cited
Woolf, B, Annals of Human Genetics, 1955, On estimating the relationship between blood group and disease	14388528	10.1111/j.1469-1809.1955.tb01348.x	Cited
Wright, S, Science, 1938, Size of population and breeding structure in relation to evolution	—	—	—
Yoo, YJ et al., BMC Proceedings, 2007, Case-control association analysis of rheumatoid arthritis with candidate genes using related cases	18466531	10.1186/1753-6561-1-s1-s33	Cited
Yu, J et al., Nature Genetics, 2006, A unified mixed-model method for association mapping that accounts for multiple levels of relatedness	16380716	10.1038/ng1702	Cited

In this knowledge base

Title	Year	PMID
Genetic and neurophysiological correlates of the age of onset of alcohol use disorders in adolescents and young adults.	2013	23963516

External

Title	Authors	Journal	Year	Link
Detection of <i>MC1R</i> Genetic Variants and Their Association with Coat Color in Asian Goats.	Kawaguchi F et al.	—	2025	→
GhostKnockoff inference empowers identification of putative causal variants in genome-wide association studies.	He Z et al.	—	2022	→
Beyond standard pipeline and p < 0.05 in pathway enrichment analyses.	Li W et al.	—	2021	→
Analysis of Association Between Dietary Intake and Red Blood Cell Count Results in Remission Ulcerative Colitis Individuals.	Głąbska D et al.	—	2019	→
Association study in African-admixed populations across the Americas recapitulates asthma risk loci in non-African populations.	Daya M et al.	—	2019	→
Learning Bayesian Networks from Correlated Data.	Bae H et al.	—	2016	→
On the Question of Effective Sample Size in Network Modeling: An Asymptotic Inquiry.	Kolaczyk ED et al.	—	2015	→
Role of Established Type 2 Diabetes-Susceptibility Genetic Variants in a High Prevalence American Indian Population.	Hanson RL et al.	—	2015	→
Characteristics of canonical intrinsic connectivity networks across tasks and monozygotic twin pairs.	Moodie CA et al.	—	2014	→
Genetic and neurophysiological correlates of the age of onset of alcohol use disorders in adolescents and young adults.	Chorlian DB et al.	—	2013	→
Transferability and fine mapping of type 2 diabetes loci in African Americans: the Candidate Gene Association Resource Plus Study.	Ng MC et al.	—	2013	→
Analysis of family- and population-based samples in cohort genome-wide association studies.	Manichaikul A et al.	—	2012	→
The TNF-α -308 Promoter Gene Polymorphism and Chronic HBV Infection.	Tayebi S et al.	—	2012	→
A comparison of founder-only and all-pedigree-members genotype-expression association by regression analysis.	Suh YJ et al.	—	2007	→