Chunk #2 — Online methods — Gene expression normalization

Source: Systematic identification of trans eQTLs as putative drivers of known disease associations.
Embedded: yes

Text

Gene expression data was quantile-normalized to the median distribution, and subsequently log2 transformed. The probe and sample means were centered to zero. Gene expression data was then corrected for possible population structure by removal of four multi-dimensional scaling components using linear regression. We reasoned earlier that normalized gene-expression data still contains large amounts of non-genetic variation5. After population stratification correction, principal component analysis (PCA) was therefore performed on the sample correlation matrix. We performed a separate QTL analysis for each principal component (PC), to ascertain whether genetic variants could be detected that affect the PC. If we found an effect on the PC, we did not correct the expression data for these components, to ensure we would not unintentionally remove genetic effects from the expression data. Significance of these associations was established by controlling the false discovery rate (FDR), testing each association against a null-distribution created by repeating the analysis 100 times (permuting the sample labels for each iteration49). PCs that did not show significance at the FDR threshold of 0.0 were removed from the gene expression data by linear