controls needed to obtain 80% power assuming 10,000 genes is ~28,500, well beyond any available dataset (Fig. 7B,C). Our model demonstrates that the distribution of expected differential expression, across genes, is quite similar to the observed distribution from the CMC data (Fig. 7). This calls into question results from smaller studies that report large differential expression. Our analyses show that these studies would have notably larger variability, and because genome-wide surveys test a large number of genes, that variability can translate into large observed differential expression: even when no gene is differentially expressed, studies with only 25 cases and 25 controls can lead to estimates of differential expression exceeding twofold. Notably, this pattern not seen when the N is raised to 250. (See supplementary text for additional scenarios, discussion and modeling.).