Do baseline P-values follow a uniform distribution in randomised trials?

paper Cited Public

Authors: Bland, Martin
Year: 2013
Journal: PloS one
PMID: 24098419
DOI: 10.1371/journal.pone.0076010
PMCID: PMC3788030

Figure 1

Distribution of P-values for 10,000 two sample t tests for Normal data.Means were compared between two groups of 10 observations from a Standard Normal distribution.

Figure 2

Distribution of P-values for 10,000 two sample t tests for Lognormal data.Means were compared between two groups of 10 observations from a Lognormal distribution.

Figure 3

P-values from four realisations of 10,000 correlated t tests for Normal data.Means were compared between two groups of 10 observations from a Standard Normal distribution where each test used variables with correlation 0.5 with the other variables.

Figure 4

Distribution of P-values for chi-squared tests and Fisher’s exact tests for two by two tables.Chi-squared and Fisher’s exact test, both two-sided and one-sided, were calculated for the comparison of two samples of size 10 with a binary outcome variable with probability 0.5 of being 0 and 0.5 of being 1.

Figure 5

P-values for chi-squared and Fisher’s exact tests for large samples with fixed size groups.Chi-squared and Fisher’s exact test, both two-sided and one-sided, were calculated for10,000 comparisons of two samples of size 1,000 with a binary outcome variable with probability 0.5 of being 0 and 0.5 of being 1.

Figure 6

P-values for chi-squared and Fisher’s exact tests for large groups of varying size.Chi-squared and Fisher’s exact test, both two-sided and one-sided, were calculated for10,000 comparisons in samples of size 2,000, both group and outcome having probability 0.5 of being 0 and 0.5 of being 1.

#	Section	Preview
0	Introduction	I have seen the theory put forward that if a null hypothesis is true, P-values should follow a…
1	Introduction	The first context was when reviewing a paper proposing methods to investigate the quality of…
2	Introduction	The second context was in an analysis, planned by a colleague, as an indicator of valid…
3	Introduction	The same idea has been used by others in a less developed way, looking at the proportion of baseline…
4	Introduction	If P-values do not appear to fit a Uniform distribution, is it valid to conclude that the…
5	Introduction	In this paper I test these ideas by simulation.
6	Materials and Methods	The simulations were done using Stata 12 (Stata Corp., College Station, Texas) as follows:
7	Materials and Methods — 1. The two sample t test with Normal data	The basic simulation generated two groups of 10 observations from a Standard Normal distribution.…
8	Materials and Methods — 2. The two sample t test with highly skewed data	The basic simulation generated two groups of 10 observations from a Standard Normal distribution and…
9	Materials and Methods — 3. The two sample t test with Normal data where the tests are correlated	Two groups of 10 observations from a Standard Normal distribution were generated. For each basic…
10	Materials and Methods — 4. Binary data in a small sample	To compare two groups for a binary variable, such as gender, a chi-squared test or Fisher’s exact…
11	Materials and Methods — 5. Binary data in a large sample	The chi-squared test would not usually be regarded as valid for a two way table with only 20…
12	Results — 1. The two sample t test with Normal data	Figure 1 shows the distribution of P-values for 10,000 two sample t test comparing means in two…
13	Results — 2. The two sample t test with highly skewed data	Figure 2 shows the results of the simulation for 10,000 two sample t tests using data from a…
14	Results — 3. The two sample t test with Normal data where the tests are correlated	Figure 3 shows the results of valid t tests which are not independent, using correlated data. As the…
15	Results — 4. Binary data in a small sample	Figure 4 shows the results of an uncorrected chi-squared test and Fisher’s exact test, both…
16	Results — 4. Binary data in a small sample	Because the number of two by two tables with two row totals equal to 10 is limited, there are only…
17	Results — 5. Chi-squared and Fisher’s exact test for large samples	Figure 5 shows the results of an uncorrected chi-squared test and Fisher’s exact test, both…
18	Results — 5. Chi-squared and Fisher’s exact test for large samples	I considered Fisher’s exact test one-sided. This is also testing a null hypothesis which is known…
19	Results — 5. Chi-squared and Fisher’s exact test for large samples	I thought it possible that the peaks and troughs were the result of having one fixed margin, which…

Name	Type
age	phenotype
baseline variable local	phenotype
Central Allocation Centres local	cohort
clinical trial	cohort
clinical trials	cohort
Local Allocation Centres local	cohort
meta-analysis	cohort
Normal distribution local	drug
Pocock et al. study local	cohort
P-value	drug
Randomised Clinical Trials local	cohort
Randomised groups local	cohort
Skewed distribution local	drug
Two-sample t test local	drug
Uniform distribution local	drug

Citation	PMID	DOI	Status
AltmanDG (1985) Comparability of randomised groups. Statistician 34: 125–136.	—	—	—
AltmanDG, DoréCJ (1990) Randomisation and Baseline Comparisons in Clinical Trials. Lancet 335: 149–153.196744110.1016/0140-6736(90)90014-v	—	—	—
BergerVW (2010) Testing for baseline balance: Can we finally get it right? J Clin Epidemiol 63: 939–940.2045692010.1016/j.jclinepi.2010.02.014PMC2904824	—	—	—
BergerVW, ExnerDV (1999) Detecting selection bias in randomized clinical trials. Control Clin Trials 20: 319–327.1044055910.1016/s0197-2456(99)00014-8	—	—	—
GardnerMJ, AltmanDG (1986) Confidence intervals rather than P values: estimation rather than hypothesis testing. BMJ 292: 746–50.308242210.1136/bmj.292.6522.746PMC1339793	—	—	—
KennedyA, GrantA (1997) Subversion of allocation in a randomised controlled trial. Control Clin Trials 18: S77–S78.	—	—	—
PocockSJ, AssmannSE, EnosLE (2002) KastenLE (2002) Subgroup analysis, covariate adjustment and baseline comparisons in clinical trial reporting: current practice and problems. Stat Med 21: 2917–2930.1232510810.1002/sim.1296	—	—	—
RobertsC, TorgersonDJ (1999) Understanding controlled trials: Baseline imbalance in randomised controlled trials. BMJ 319: 185.1040676310.1136/bmj.319.7203.185PMC1116277	—	—	—
SchulzKF, ChalmersI, GrimesDA, AltmanDG (1994) Assessing the quality of randomization from reports of controlled trials published in obstetrics and gynecology journals. JAMA 272: 125–128.8015122	—	—	—
SennS (1994) Testing for baseline balance in clinical trials. Stat Med 13: 1715–1726.799770510.1002/sim.4780131703	—	—	—
The CONSORT statement. Available http://www.consort-statement.org/consort-statement/. Accessed 2013 Sept. 10.	—	—	—

In this knowledge base

Title	Year	PMID
A Brief Critique of the TATES Procedure.	2018	29468442

External

Title	Authors	Journal	Year	Link
Response: Integrity of randomized clinical trials: Performance of integrity tests and checklists requires assessment.	Chien PFW	—	2023	→
Automated detection of over- and under-dispersion in baseline tables in randomised controlled trials.	Barnett A	—	2022	→
Parasites make hosts more profitable but less available to predators	Prosnier L et al.	—	2022	—
Methods to assess research misconduct in health-related research: A scoping review.	Bordewijk EM et al.	—	2021	→
Diagnosing fraudulent baseline data in clinical trials.	Proschan MA et al.	—	2020	→
Effects of physical activity interventions on cognitive outcomes and academic performance in adolescents and young adults: A meta-analysis.	Haverkamp BF et al.	—	2020	→
Reconceptualizing the <i>p</i>-value from a likelihood ratio test: a probabilistic pairwise comparison of models based on Kullback-Leibler discrepancy measures.	Riedle B et al.	—	2020	→
Baseline P value distributions in randomized trials were uniform for continuous but not categorical variables.	Bolland MJ et al.	—	2019	→
No evidence for a bilingual executive function advantage in the nationally representative ABCD study.	Dick AS et al.	—	2019	→
Reproducibility of Methods to Detect Differentially Expressed Genes from Single-Cell RNA Sequencing.	Mou T et al.	—	2019	→
Rounding, but not randomization method, non-normality, or correlation, affected baseline P-value distributions in randomized trials.	Bolland MJ et al.	—	2019	→
A Brief Critique of the TATES Procedure.	Aliev F et al.	—	2018	→
Deviations from Expectations: A Commentary on Aliev et al.	van der Sluis S et al.	—	2018	→
Network-Based Approaches for Pathway Level Analysis.	Nguyen T et al.	—	2018	→
Statistical Approach for Gene Set Analysis with Trait Specific Quantitative Trait Loci.	Das S et al.	—	2018	→
Correlation among baseline variables yields non-uniformity of p-values.	Betensky RA et al.	—	2017	→
DANUBE: Data-driven meta-ANalysis using UnBiased Empirical distributions-applied to biological pathway analysis.	Nguyen T et al.	—	2017	→
The distribution of P-values in medical research articles suggested selective reporting associated with statistical significance.	Perneger TV et al.	—	2017	→
Conducting Meta-Analyses Based on p Values: Reservations and Recommendations for Applying p-Uniform and p-Curve.	van Aert RC et al.	—	2016	→
Social relationships and cognitive decline: a systematic review and meta-analysis of longitudinal cohort studies.	Kuiper JS et al.	—	2016	→
The Statistical Value of Raw Fluorescence Signal in Luminex xMAP Based Multiplex Immunoassays.	Breen EJ et al.	—	2016	→
The distribution of probability values in medical abstracts: an observational study.	Ginsel B et al.	—	2015	→
A methodological review of recent meta-analyses has found significant heterogeneity in age between randomized groups.	Clark L et al.	—	2014	→