Chunk #37 — EXPERIMENTAL PROCEDURES — SARSA Learner

Source: States versus rewards: dissociable neural prediction error signals underlying model-based and model-free reinforcement learning.
Embedded: yes

Text

The RPE is used to update the state-action value as: QSARSA(s,a)=QSARSA(s,a)+αδRPE where α is a free parameter controlling the SARSA learning rate.