Chunk #12 — Results — Universal interface for biomedical gene sets

Source: clusterProfiler 4.0: A universal enrichment tool for interpreting omics data.
Embedded: yes

Text

The gene set annotation required by enricher and GSEA is a two-column data frame with one column representing gene set names (ID or descriptive name) and the other showing the corresponding genes. The gene matrix transposed (GMT) format is widely used to distribute gene set annotations. There are many gene set libraries available online (e.g., https://maayanlab.cloud/Enrichr/#stats), including MSigDB (Molecular Signatures Database), Disease Signatures, and CCLE (Cancer Cell Line Encyclopedia). To enable the utilization of these gene sets in clusterProfiler as the background annotation to explore the underlying biological mechanisms, clusterProfiler provides a parser function, read.gmt, to import GMT files that can be directly passed to the enricher and GSEA functions. In the following example, we used the GSEA function to perform gene set enrichment analysis using WikiPathways (Figure 5B). The annotation data were parsed by using read.gmt.wp, which is a customized version of read.gmt for importing GMT files from WikiPathways.## downloaded fromhttps://wikipathways-data.wmcloud.org/current/gmt/gmt <- ‘wikipathways-20210310-gmt-Homo_sapiens.gmt’wp <- read.gmt.wp(gmt)ewp <- GSEA(geneList, TERM2GENE=wp[,c("wpid", "gene")], TERM2NAME=wp[,c("wpid", "name")])