paperKB
coga / coga-kb
Help
Sign in

Chunk #28 — DISCUSSION

Source
The Molecular Signatures Database (MSigDB) hallmark gene set collection.
Embedded
yes

Text

Here we introduce a collection of hallmarks, along with a methodology to generate them, and demonstrate their utility in several examples. The hallmark generation method of gene overlap yielded groups of gene sets with coherent annotation and thus eventually produced hallmarks that represented the relevant signal in related and potentially redundant gene sets. Because gene sets often convey approximate and incomplete versions of the pertinent biological conditions, we developed a hybrid approach, which combined computational and manual steps. The automated steps included clustering, microarray data processing and meta-analysis. Expert human biological review was essential to leverage prior domain knowledge for labeling clusters with biological themes because the automated clustering methods do not provide a sense of the degree of biological resolution represented by the clusters. Additional manual tasks, also requiring an experienced curator, included locating microarray datasets and annotating their phenotype classes. The refinement methodology allows the hallmark to contain the most transcriptionally coherent set of genes, which serve as more effective and accurate transcriptional signatures for detecting specific biological processes. By summarizing relevant information from thousands of founder gene