Chunk #14 — Materials and Methods — Data and imputation

Source: A new statistic to evaluate imputation reliability.
Embedded: yes

Text

The first dataset was collected as part of SAGE, one study in the Gene Environment Association (GENEVA) project (http://genevastudy.org/). Samples were genotyped on the Illumina Human 1 M array at the Center for Inherited Disease Research (CIDR) at Johns Hopkins University. The Illumina 1 M array has a total of 1,049,008 probes as SNP assays. All SNPs with a genotype call rate <98% were removed, as were SNPs with a Hardy-Weinberg exact p value <1×10−4. Additional data cleaning procedures were applied to ensure the highest possible data quality, including using HapMap controls, detection of gender and chromosomal anomalies, hidden relatedness, population structure, batch effects, Mendelian error detection, and duplication error detection[16]. The composition of the remaining project samples in terms of self-identified ethnicity is 2597 European Americans and 1264 African Americans, confirmed by principal component analysis. Among the 1,049,008 SNPs, 948,658 SNPs (90%) passed data cleaning procedures.