For the gene permutation test category, we used the Fisher’s exact, hypergeometric, and GOSeq111 tests. For these tests, genes were separated into two classes depending on whether they met FDR criteria for differential expression at the gene or isoform levels (estimated FDR ≤ 5% for either genes or isoforms), or not; this set of differentially expressed genes was then evaluated for overlap versus non-overlap with the gene set being evaluated for enrichment (i.e., a 2 × 2 table was constructed). Compared to the hypergeometric and Fisher’s tests, GOSeq has an advantage for RNA-seq data in that it explicitly accounts for the detection bias of long and highly expressed transcripts. For the subject permutation category of tests, we used GSVA112, ssGSEA110, PLAGE113, and zScore114, all implemented in the gsva package of bioconductor115. To combine the results of these tests, within each of the two primary categories, we used Fisher’s method for combining P values with Brown’s correction, which is an extension of Fisher’s method that accounts for correlation between the different enrichment test statistics 116. Then, within category, P values were