exact test, p=1.29×10–11). This p value should be interpreted with caution because many of the replication attempts were not independent of each other (e.g., the 5-HTTLPR×stressful life events interaction predicting depression was tested multiple times). Consequently we reran the analysis, excluding all but the first published replication attempt for each interaction. Despite the reduction in number of data points and the attendant loss of power, the results remained highly significant: 22% (2/9) of first replication attempts were positive, compared with 96% (45/47) of novel studies (Fisher's exact test, p=5.2×10–6).