Sensitivity To obtain an impression of the sensitivity of the algorithms, we considered the Bottomly et al. [16] dataset, which contains ten and eleven replicates of two different, genetically homogeneous mice strains. This allowed for a split of three vs three for the evaluation set and seven vs eight for the verification set, which were balanced across the three experimental batches. Random splits were replicated 30 times. Batch information was not provided to the DESeq (old), DESeq2, DSS, edgeR or voom algorithms, which can accommodate complex experimental designs, to have comparable calls across all algorithms.