raw sets according to its gene expression profile in a number of datasets relevant to the corresponding biological theme. The refinement excluded genes that did not well discriminate the relevant phenotype. In this way, only coordinately expressed and biologically relevant genes remained in the final hallmark to be added to the collection. An additional validation procedure determined whether the final hallmark generalized, i.e., performed as expected in an independent dataset that was not used for the refinement. Founders for the final set of 50 hallmarks (Table 1) comprise 4,022 of the original 8,380 MSigDB gene sets.