sizes, is open to debate. In theory, the test statistic, and thus the p-value, already take N into account. In practice, however, a procedure that weights for the sample size can be more powerful [34]. We tried one type of weighting for Simes and TATES, in which each p-value was weighted by dfmax/dfj, where dfmax denotes the maximal number of degrees of freedom (i.e., sample size) of the 20 simulated phenotypes, and dfj denotes the number of degrees of freedom for the j th phenotype in the set of 1…20. This way, the p-value belonging to the largest sample was weighted by dfmax/dfmax = 1, while the other p-values were weighted by dfmax/dfj and as dfj is always <dfmax the weight is thus >1, i.e., p-values derived from small samples were adjusted upwards and are therefore less likely to be the minimal p-value chosen by Simes or TATES.)