Chunk #78 — ONLINE METHODS — Normalization of Gene Expression and Adjustment for Covariates — Evaluation and selection of co-variates

Source: Gene expression elucidates functional impact of polygenic risk for schizophrenia.
Embedded: yes

Text

Following basic sample-level normalization and gene-level filtering, we assessed the relationship between known clinical, technical, and experimental sample-level variables and the gene-level expression values in the normalized read count matrix. The purpose of this exploratory analysis was to determine which of these variables should be included as covariates that statistically adjust the gene expression levels for downstream analyses (i.e., eQTL discovery, differential expression, and gene co-expression). The final model, which we call “the covariate model”, included 12 sample variables (Dx [3], Institution [3], Sex [2], AOD, PMI, RIN, RIN2, and 5 ancestry vectors) and 1 experiment variable (clustered LIB [9]), where the number of levels for factor variables is noted here in square brackets. Counting the intercept term, this model accounted for 23 df and yielded an average r2 of 0.42 (For description of the model selection procedure, see Supplementary Text). We use this model in most analyses reported in the manuscript, except where otherwise noted (see Supplementary Fig. 2). We discuss the addition of surrogate variables (Supplementary Fig. 4G, H and Supplementary Text); the fit of the various models