Chunk #70 — ONLINE METHODS — Normalization of Gene Expression and Adjustment for Covariates

Source: Gene expression elucidates functional impact of polygenic risk for schizophrenia.
Embedded: yes

Text

Gene-level analyses started with the HTSeq-derived sample-by-gene read count matrix. The basic normalization and adjustment pipeline for the expression data matrix (Supplementary Fig. 2, middle and bottom panels) consisted of: a) exploration to determine which known and hidden covariates should be accounted for during analyses; b) voom-based calculation of normalized log(CPM) (read counts per million total reads), along with weights that estimate the precision of each log(CPM) observation estimate68 c) linear regression-based adjustment for the chosen covariates, where linear regression for each gene is performed independently and using the observation weights, so that observations with higher presumed precision will be up-weighted in the linear model fitting process (i.e., weighted least squares regression). We now detail the procedure involved for each of the above steps, where we include both SCZ and AFF cases and controls, and the corresponding diagnosis status (“Dx”) is the primary variable of interest.