Chunk #40 — Experimental Procedures — Hallmark generation methodology — Step 5: Refining raw hallmark sets

Source: The Molecular Signatures Database (MSigDB) hallmark gene set collection.
Embedded: yes

Text

We assessed how well each gene in each raw hallmark discriminated the relevant phenotypes in each of the datasets identified in step 3. We again used the IC between the phenotype or class vector and the gene expression profiles as the discrimination metric. We assessed the statistical significance of each gene’s IC score and produced nominal p-values using a sample permutation test to create an empirical null distribution. This was done independently for each gene expression test dataset. A meta-analysis produced summary p-values across these datasets using Fisher’s method (Fisher, 1948) as implemented in R package MetaDE (Wang et al., 2012). We used summary p-values to compute False Discovery Rates (FDR) following the approach of (Benjamini and Hochberg, 1995). The genes in the raw hallmark were then sorted by their FDR values and the top scoring genes with summary FDR values less than 0.01 comprised the final hallmark set. When the number of genes obtained by this method was less than 15 (or more than 200) the top scoring 15 (or 200) genes were chosen regardless of their FDR values.