Chunk #40 — EXPERIMENTAL PROCEDURES — HYBRID Learner

Source: States versus rewards: dissociable neural prediction error signals underlying model-based and model-free reinforcement learning.
Embedded: yes

Text

We considered a third, HYBRID learner, which combines state-action value estimates from both SARSA and FORWARD learners into a single set of value estimates. The model assumes that the two sets of state-action value estimates are combined according to a weighted average. We assume that the relative weight accorded to the two functions in determining the hybrid state-action valuations (and thus choice behavior) can change over the course of the free-choice scanning session (session 2). Following Camerer and Ho (1998), we characterize the form of this change with an exponential function: wt=l×e−kt where wt is the trial-specific weight term for trial number t, and l and k are two free parameters describing the form of the exponential decay (l: offset, k: slope).