Chunk #2 — RESULTS — Computational models of model-free and model-based learning

Source: States versus rewards: dissociable neural prediction error signals underlying model-based and model-free reinforcement learning.
Embedded: yes

Text

Doya, 1999; Doya et al., 2002), we implemented a HYBRID learner that chooses actions by forming a weighted average of the action valuations from the SARSA and FORWARD learners. The relative weighting is expected to change over time; indeed, given suitable prior expectations, there are normative proposals for determining how (Daw et al., 2005) (see also Behrens et al. (2007)). Given the singularity of the transition from non-rewarded to rewared trials, we built three simple models for the change in weighting over time, finding that an exponential decay from FORWARD to SARSA (Camerer and Ho, 1998) (Figure 2) fitted best (see Table S1).