Chunk #35 — Experimental Procedures — Hallmark generation methodology — Step 1: Identify groups of similar gene sets using consensus clustering

Source: The Molecular Signatures Database (MSigDB) hallmark gene set collection.
Embedded: yes

Text

The bootstrapping resampling procedure for consensus clustering involved sampling with replacement from a pool of 31,847 genes comprising the union of all the 8,380 original gene sets. We performed 100 resampling iterations and carried out consensus clustering for 50 ≤ k ≤ 8,000 in increments of 50. We used cophenetic coefficients (ρ) of the consensus clustering results to estimate the optimal number of clusters. The cophenetic analysis showed two peaks: one at k = 450 (ρ = 0.9668) and another at k = 600 (ρ = 0.9670, Figure S3). After inspecting results for both values of k, we found the partition with k = 450 to be too coarse and heterogeneous for our purposes. On the other hand, clusters made with k = 600 seemed to be at the level of granularity that was more appropriate for making hallmark sets. We therefore chose the partition at k = 600 to produce clusters of gene sets for the subsequent steps in the hallmark methodology.