paperKB
coga / coga-kb
Help
Sign in

Chunk #35 — Experimental Procedures — Hallmark generation methodology — Step 1: Identify groups of similar gene sets using consensus clustering

Source
The Molecular Signatures Database (MSigDB) hallmark gene set collection.
Embedded
yes

Text

The bootstrapping resampling procedure for consensus clustering involved sampling with replacement from a pool of 31,847 genes comprising the union of all the 8,380 original gene sets. We performed 100 resampling iterations and carried out consensus clustering for 50 ≤ k ≤ 8,000 in increments of 50. We used cophenetic coefficients (ρ) of the consensus clustering results to estimate the optimal number of clusters. The cophenetic analysis showed two peaks: one at k = 450 (ρ = 0.9668) and another at k = 600 (ρ = 0.9670, Figure S3). After inspecting results for both values of k, we found the partition with k = 450 to be too coarse and heterogeneous for our purposes. On the other hand, clusters made with k = 600 seemed to be at the level of granularity that was more appropriate for making hallmark sets. We therefore chose the partition at k = 600 to produce clusters of gene sets for the subsequent steps in the hallmark methodology.