Chunk #0 — INTRODUCTION

Source: The Molecular Signatures Database (MSigDB) hallmark gene set collection.
Embedded: yes
Text

High-throughput technologies, such as microarrays and next generation sequencing, generate measurements of gene activity at genomic scale. For transcription profiling, these technologies report transcript abundances for tens of thousands of genes. Analysis of this type of data usually follows one of two approaches. The first identifies genes that are differentially expressed across phenotypes of interest. This is straightforward to perform, but in practice it leads to challenges in the follow-up analysis and interpretation of results. For example, in some instances only a few genes reach statistical significance and the analysis may not produce meaningful results. Alternatively, when a large number of genes pass a significance threshold, there may be no obvious way to select the most interesting genes to follow up. Moreover, the resulting list of genes may be difficult to interpret and to identify the relevant biological process that those genes represent. An alternative approach, pioneered by Gene Set Enrichment Analysis (GSEA) (Mootha et al., 2003; Subramanian et al., 2005), focuses on coordinated differential expression of annotated groups of genes, or gene sets, and produces results that can more