As in the original two-step task (Daw et al., 2011), our task consisted of a choice between two “first-step” actions that led probabilistically to one of two “second-step” states in which reward could be obtained. Each first-step action commonly led to one second-step state and rarely to the other. However, whereas in the original task these action-state transition probabilities were constant, we introduced occasional reversals in the transition probabilities (i.e., transitions that were previously common became rare and vice versa).