Chunk #35 — ONLINE METHODS — Computing t-statistics

Source: Heritability enrichment of specifically expressed genes identifies disease-relevant tissues and cell types.
Embedded: yes

Text

Thus, for a focal tissue (e.g., cortex) in a larger tissue category (e.g., brain), we computed the t-statistic for gene g as follows. We first constructed a design matrix X where each row corresponds to a sample either in cortex or outside of the brain. The first column of X has a 1 for every cortex sample and a -1 for every non-brain sample. The remaining columns are an intercept and covariates (see below). The outcome Y in our model is expression. We fit this model via ordinary least squares, and compute a t-statistic for the first explanatory variable in the standard way: t=(XTX)−1XTY[0]MSE.(XTX)−1[0,0] where MSE is the mean squared error of the fitted model; i.e., MSE= 1N(Y−X(XTX)−1XTY)T(Y−X(XTX)−1XTY) where N is the number of rows in X. This gives us a t-statistic for each gene for the focal tissue. We then select the top 10% of genes, add a 100kb window around their transcribed regions, and apply stratified LD score regression to the resulting genome annotations as described below.