Chunk #9 — Materials andMethods — Estimation of Genetic Ancestry

Source: The National Longitudinal Study of Adolescent to Adult Health (Add Health) sibling pairs genome-wide data.
Embedded: yes

Text

Second, we explored genetic ancestry using the software package, ADMIXTURE (Alexander et al., 2009). ADMIXTURE uses an efficient likelihood model-based estimation of genetic ancestry using genome-wide data. For the ADMIXTURE procedure, we opted for a supervised analysis utilizing a series of known genetic ancestry populations as fixed groups to estimate the proportion of ancestry that individuals from the Add Health sibling pairs subsample share with each ancestral reference population. The ancestral populations used were derived from the Human Genome Diversity Project (HGDP; Li et al., 2008) and International Haplotype Map Project (HapMap; International HapMap 3 Consortium, 2010). Specifically, we utilized 108 samples from the HGDP to represent the Americas (Surui, Maya, Karitiana, Pima and Colombian), and 402 samples from HapMap to represent Europe (CEU), Africa (YRI), China (CHB) and Japan (JPT). In all, we identified 257,035 SNP markers that overlap across the Add Health sibling pairs subsample, the HGDP sample and the HapMap sample. For efficiency using the program ADMIXTURE, we created an autosomal SNP marker set that was in approximate linkage equilibrium (123,198 SNPs) to estimate ancestry.