paperKB
coga / coga-kb
Help
Sign in

Chunk #7 — RESULTS — Generating the Hallmark Collection

Source
The Molecular Signatures Database (MSigDB) hallmark gene set collection.
Embedded
yes

Text

Here we give an overview of the hallmarks generation procedure (see Methods for details). We first identified groups of similar gene sets according to their individual gene membership overlaps using consensus clustering. Starting with 8,380 gene sets from MSigDB v4.0 collections C1-C6, the consensus clustering grouped them into 600 clusters. We manually reviewed the clusters and were able to annotate 43 of them with 50 clear biological themes. While 36 clusters had only one theme assigned to them, seven clusters were assigned to two themes due to the heterogeneity of their founder gene sets (see Supplemental Experimental Procedures, Note 1 for details). These themes, and their associated clusters, served as candidates for an initial collection of hallmark signatures. We defined “raw” sets, one for each candidate hallmark, as the union of a cluster’s gene sets. We refined each of these raw sets according to its gene expression profile in a number of datasets relevant to the corresponding biological theme. The refinement excluded genes that did not well discriminate the relevant phenotype. In this way, only coordinately expressed and biologically relevant