Using ancestry matching to combine family-based and unrelated samples for genome-wide association studies.
- Authors
- Crossett, Andrew; Kent, Brian P; Klei, Lambertus; Ringquist, Steven; Trucco, Massimo; Roeder, Kathryn; Devlin, Bernie
- Year
- 2010
- Journal
- Statistics in medicine
- PMID
- 20862653
- DOI
- 10.1002/sim.4057
- PMCID
- PMC4629477
We propose a method to analyze family-based samples together with unrelated cases and controls. The method builds on the idea of matched case-control analysis using conditional logistic regression (CLR). For each trio within the family, a case (the proband) and matched pseudo-controls are constructed, based upon the transmitted and untransmitted alleles. Unrelated controls, matched by genetic ancestry, supplement the sample of pseudo-controls; likewise unrelated cases are also paired with genetically matched controls. Within each matched stratum, the case genotype is contrasted with control/pseudo-control genotypes via CLR, using a method we call matched-CLR (mCLR). Eigenanalysis of numerous SNP genotypes provides a tool for mapping genetic ancestry. The result of such an analysis can be thought of as a multidimensional map, or eigenmap, in which the relative genetic similarities and differences amongst individuals is encoded in the map. Once constructed, new individuals can be projected onto the ancestry map based on their genotypes. Successful differentiation of individuals of distinct ancestry depends on having a diverse, yet representative sample from which to construct the ancestry map. Once samples are well-matched, mCLR yields comparable power to competing methods while ensuring excellent control over Type I error.
HapMap trios matched by ancestry to POPRES controls. The 30 offsprings from the HapMap, CEU sample, trios serve as cases and the 2184 individuals of European ancestry from the POPRES data serve as controls. (a) The plot displays the top two principal components of ancestry for cases (red) and controls (black) obtained using SGA. Based on the distribution of points in the eigenmap, many available controls would not be good matches to the HapMap trios. Only those delineated in blue are considered further. Each case is matched to one or more controls that are genetically similar based on the eigenvectors. (b) Distance between controls and closest case when matching in a random subset drawn from the full sample of controls versus (c) the distances when the controls consist of the restricted sample delineated in blue.
(a) African, (b) East Asian, and (c) European clusters identified by SGA. The 51 population samples within HGDP were analyzed to identify homogeneous clusters using SGA applied to continental samples. Analysis was performed separately for each continent using SpectralGEM. Population labels were ignored in the analysis. The display is organized to emphasize when a population or group of populations falls into a common cluster. Groups of populations that fall into a common cluster are often from a common region; see Supplementary Figures 1 and 2.
HGDP and POPRES eigenmap representations plotted for various ancestry bases. In each panel, the eigenvectors (labeled PC) are calculated using a portion of the data, called the base. The remaining samples are projected using the Nystrom approximation. For each eigenmap we show only the top two principal components, POPRES (turquoise) and HGDP (black). (a) Base = HGDP, projected = POPRES; (b) Base = POPRES, projected = HGDP; (c) Base = HGDP + half of POPRES, projected = half of POPRES; and (d) Base = half of the balanced subset of countries including HGDP, projected = remaining half of the balanced subset.
Comparing ancestry of selected groups in HGDP versus POPRES for the top two principal components. SGA was performed using the balanced sample (Figure 3(d)). Individuals selected for comparison from POPRES and HGDP are highlighted using colors other than turquoise. (a) HGDP-French (black) versus POPRES-French (fuchsia); (b) HGDP-Orcadian (black) versus POPRES-British & Irish (fuchsia); (c) HGDP-Tuscan (black) and HGDP-N. Italian (blue) versus POPRES-Italian (fuchsia); and (d) HGDP-French Basque (black) versus POPRES-French (fuchsia), POPRES-Spanish & Portuguese (blue).
Type I error analysis at α = 0.05. Solid line represents Type I error for mCLR method and dashed line represents Type I error for combined association analysis with Fst =0.05 (a), Fst =0.01 (b), and Fst = 0.001 (c). Results are based on 5000 replications of 500 unrelated controls and 500 trios.
Power analysis at α = 0.05. (a) mCLR method (solid line) versus combined association analysis (dashed line). Results are based on 5000 replications of 500 unrelated controls and 500 trios. (b) Power of mCLR method plotted against the theoretical ratio of controls to case (R). Results are based on 10 000 replications under the assumption that ψ = 1.3, 1.4, 1.5.
Association between HLA markers and Type 1 diabetes. –log10(p-values) are plotted versus individual SNPs in the HLA region of chromosome 6. (a) All controls matched; (b) 1:10 matching; (c) 1:5 matching; and (d) Trios only. The strongest association occurs for rs241427 (diamond) and next strongest for rs9273363 (triangle).
No entities extracted from this document yet.
No uploaded files.
In this knowledge base
External
| Title | Authors | Journal | Year | Link |
|---|---|---|---|---|
| Hereditary variants of unknown significance in African American women with breast cancer. | McDonald JT et al. | — | 2022 | → |
| The Genetic Architecture of Obsessive-Compulsive Disorder: Contribution of Liability to OCD From Alleles Across the Frequency Spectrum. | Mahjani B et al. | — | 2022 | → |
| Functional rare and low frequency variants in BLK and BANK1 contribute to human lupus. | Jiang SH et al. | — | 2019 | → |
| A method to exploit the structure of genetic ancestry space to enhance case-control studies | Bodea CA et al. | — | 2016 | — |
| A Method to Exploit the Structure of Genetic Ancestry Space to Enhance Case-Control Studies. | Bodea CA et al. | — | 2016 | → |
| A genome-wide association study of autism using the Simons Simplex Collection: Does reducing phenotypic heterogeneity in autism increase genetic homogeneity? | Chaste P et al. | — | 2015 | → |
| Extreme-phenotype genome-wide association study (XP-GWAS): a method for identifying trait-associated variants by sequencing pools of individuals selected from a diversity panel. | Yang J et al. | — | 2015 | → |
| Individual common variants exert weak effects on the risk for autism spectrum disorders. | Anney R et al. | — | 2012 | → |
| Rare copy number variants in tourette syndrome disrupt genes in histaminergic pathways and overlap with autism. | Fernandez TV et al. | — | 2012 | → |
| Do common variants play a role in risk for autism? Evidence and theoretical musings. | Devlin B et al. | — | 2011 | → |
| Identification of common variants influencing risk of the tauopathy progressive supranuclear palsy. | Höglinger GU et al. | — | 2011 | → |
| Multiple recurrent de novo CNVs, including duplications of the 7q11.23 Williams syndrome region, are strongly associated with autism. | Sanders SJ et al. | — | 2011 | → |
| A genome-wide scan for common alleles affecting risk for autism. | Anney R et al. | — | 2010 | → |