Chunk #3 — Background

Source: An empirical evaluation of imputation accuracy for association statistics reveals increased type-I error rates in genome-wide associations.
Embedded: yes

Text

Here, we present a comprehensive comparative analysis of the data generated by the multipoint imputation algorithm and the data obtained by direct genotyping in a type-II diabetes GWAS dataset. This imputation algorithm uses a Markov chain to infer the allelic frequencies of a marker by the information provided by a large set of flanking markers. The analyzed dataset was generated and organized by the Welcome Trust Case Control Consortium (WTCCC) and is a constituent of a large epidemiological study focused in the determination of genetic markers that could predispose an individual to seven different diseases of interest [8]. In this scientific effort, a group of approximately 3000 healthy individuals was compared to groups composed by 2000 individuals accessed by diseases of interest such as: diabetes type-II, hypertension, coronary heart disease and bipolar disorder. These healthy individuals are part of two distinct cohorts selected to avoid population stratification, a very common source of bias in GWAS. Imputation algorithms currently available can use very distinct statistical approaches and, overall, their accuracy is satisfactory [3]. Details on the most recent methods, as well