paperKB
coga / coga-kb
Help
Sign in

Chunk #7 — Materials and methods — Preprocessing and quality control

Source
Genomics of ADME gene expression: mapping expression quantitative trait loci relevant for absorption, distribution, metabolism and excretion of drugs in human liver.
Embedded
yes

Text

Raw data preprocessing of HumanHap300 Genotyping BeadChip was also performed using BeadStudio version 3.0. Next, missing genotypes were estimated using the MACH imputation algorithm, which is based on a hidden Markov model.28 Subsequently, 15 235 SNPs with an extremely low call rate (<95%),29 3466 SNPs with low minor allele frequencies (<4%)16 and 201 SNPs not in the Hardy–Weinberg equilibrium (false discovery rate ⩽0.2), were excluded from further analyses.16, 30 Genetic similarity between samples, referred to as population substructure, may lead to false-positive association results.31 To identify possibly related individuals, we calculated pairwise identity-by-state distances. Consequently, one sample was excluded because of >95% genotype identity to another sample.32 To detect further putative population substructures, the method of Price et al.33 based on principal components analysis (PCA) was applied. This analysis revealed no evidence for population substructure within our cohort of liver samples. This comprehensive quality control analysis was performed using the R BioConductor package ‘GenABEL'.34 The finally processed data set was from 149 livers (71 males and 78 females, Supplementary Table S1) and consisted of 299 352 SNPs and 15 439