We defined raw hallmarks for each of the 50 hallmarks produced by the previous step. A raw hallmark is the union of a cluster of founder gene sets’ genes after excluding all the “unknown genes”. We considered a gene as “unknown” if it has been identified exclusively by automatic computational predictions or represented a poorly documented sequence such as an EST. Specifically, we defined a gene as “unknown” if its official gene symbol (according to NCBI Entrez and HUGO) matched naming conventions of an EST (e.g., “KIAA”, “LOC”, “MGC”, “FLJ”, or “DKFZp” followed by digits).