3) Assess the stability of gene set testing results. In addition to power and type I error rate, another important aspect is the stability of the significance testing results. Different sets of samples would give different results due to sampling variations. When different sub-samples from a homogenous population are taken, a method with small variance, and thus stable results across the sub-samples, would be desirable. One strategy is to take sub-samples from all the samples, conduct gene set testing for each subsample, and evaluate the stability of gene set P-values based on their changes in rank ordering in different sub-samples.