These simulations show that this idea of a Uniform distribution for P-values testing true null hypotheses just will not work under many circumstances. It is unlikely to work in most applications and wrong conclusions may be drawn. For example, in their review of published trials, Schulz et al. [1] interpreted the 2% of baseline comparisons being statistically significant, significantly lower than the expected rate of 5%, as plausibly being due to a few investigators having decided not to report statistically significant comparisons, in the belief that this would enhance the credibility of their trials. Some of these tests may be invalid and they are certainly not all independent. As Figures 2 and 3 show, it is also plausible that fewer than 5% of tests might be found to be significant as a result of this, even though all the tests were reported and there was no concealment on the part of the triallists.