In order to assess the behavioral and neural manifestations of state and reward learning more precisely, we formalized the computational approaches described above as trial-by-trial mathematical models. Based on recent empirical support (Morris et al., 2006), we used a variant of model-free RL, the so-called SARSA learner (state-action-reward-state-action) for implementing value learning via an RPE. By contrast, our model-based FORWARD learner learned a state transition model via an SPE (see Figure 2 and Experimental Procedures), and used this to evaluate actions. In the second session, the mean correlation of these prediction error signals from both models was −0.37 (±0.09 s.d.) across all subjects. (In the first session, the RPE is zero throughout, due to the lack of rewards, and only the SPE is nonzero.) Finally, since previous theoretical proposals suggest that the brain implements both approaches (Daw et al., 2005; Doya, 1999; Doya et al., 2002), we implemented a HYBRID learner that chooses actions by forming a weighted average of the action valuations from the SARSA and FORWARD learners. The relative weighting is expected to change over time; indeed, given