Chunk #8 — Methods — Correction for known and unknown factors: “REDUCED” dataset generation

Source: Patterns of cis regulatory variation in diverse human populations.
Embedded: yes

Text

We employed a latent variable analysis separately for each population to correct the expression data for known and unknown factors that may influence gene expression in this dataset, with the aim to characterize and compare the properties of these results to those obtained without the correction. This dimension reduction differs from the PCA for admixture (Population stratification correction of expression data) in that it is possible to account for effects of unknown covariates, such as complex batch effects or subtle environmental influences, which can then be factored out of the expression data. The reduced expression data sets were learned using the probabilistic estimation of expression residuals (PEER) framework [26], [27]. In this framework, contributions from known and hidden global factors on gene expression levels are estimated and subtracted out to produce a residual gene expression profile. Parameter estimation is performed using variational learning, an approximate inference algorithm that generalizes expectation maximization.