Chunk #28 — Methods — Genome-wide genotyping array data sets used for evaluation of imputation quality and/or phenotype association analysis — Hispanic Community Health Study/Study of Latinos (HCHS/SOL)

Source: Use of >100,000 NHLBI Trans-Omics for Precision Medicine (TOPMed) Consortium whole genome sequences improves imputation quality and detection of rare variant associations in admixed African and Hispanic/Latino populations.
Embedded: yes

Text

The HCHS/SOL cohort began in 2006 as a prospective study of Hispanic/Latino populations in the U.S. [40–42]. From 2008 to 2011, 16,415 adults were recruited from a random sample of households in four communities (the Bronx, Chicago, Miami, and San Diego). Each Field Center recruited >4,000 participants from diverse socioeconomic groups. Most participants self-identified as having Cuban, Dominican, Puerto Rican, Mexican, Central American, or South American heritage. The cohort has been genotyped both using an Illumina Omni2.5M array (plus 150,000 custom SNP, including ancestry-informative markers, Amerindian population specific variants, previously identified GWAS hits, and other candidate polymorphisms for a total of 2,293,715 SNPs) [43] and using the Illumina Multi-Ethnic Genotyping Array (MEGA) array (containing a total of 1,705,969 SNPs) in efforts from the Population Architecture for Genetic Epidemiology [44] consortium to better assess variation in non-European populations. The MEGA array also includes additional exonic, functional, and clinically-relevant variants. Illumina 2.5M array genotypes were available for 12,802 samples, among whom 11,887 samples also had MEGA array genotypes. The Illumina Omni2.5M array was used for imputation to the TOPMed reference panel, with