Chunk #9 — Methods — Correction for known and unknown factors: “REDUCED” dataset generation

Source: Patterns of cis regulatory variation in diverse human populations.
Embedded: yes

Text

We used the PEER Bayesian regression and factor analysis modules for each of the 8 populations separately to learn the global effects of known and hidden factors on gene expression. Population and gender indicators were modeled as known global factors, essentially using Bayesian regression. Jointly with modeling these known factors, 32 hidden factors were estimated using Bayesian factor analysis. The prior on the weight precisions that acts as a regularization parameter was set to (21800, 0.022) for both models. These regularization parameter influences the effective number of factors retained after training. Specific settings are the standard ARD from [26], scaled with the total number of probes in the model (see [26] for detailed discussion). All remaining priors were set to uninformative values. The residual values were used as input for subsequent analysis and are referred to throughout the manuscript as ‘REDUCED’ data.