paperKB
coga / coga-kb
Processing
Help
Sign in

Chunk #7 — Methodological issues — From genes to gene sets

Source
Gene set analysis of genome-wide association studies: methodological issues and perspectives.
Embedded
yes

Text

The Kyoto Encyclopedia of Genes and Genomes (KEGG) [20] and Gene Ontology (GO) [21] are frequently used gene set annotation databases. When GO terms are used, gene sets categorized into biological process categories have often been selected for gene set analysis, since the other two categories (molecular function and cellular components) are not similar to the typical biological pathways such as those from KEGG. The MSigDB database [22] includes comprehensive gene sets from both the KEGG and GO databases, as well as from other sources such as chromosome and cytogenetic band regions, gene sets collected from expert knowledge in literature, cis-regulatory motifs, and co-expressed cancer-associated genes. In addition, other sources such as the PANTHER Classification System [23] and REACTOME [24] also provide publicly available gene set information. Note that GO terms are organized in a hierarchical structure, and substantial overlap of component genes are expected between parent and child nodes. The MSigDB collection has partially solved this problem by removing the gene sets that have the same member genes with their parent nodes or their sibling nodes.