paperKB
coga / coga-kb
Help
Sign in

Chunk #37 — EXPERIMENTAL PROCEDURES — SARSA Learner

Source
States versus rewards: dissociable neural prediction error signals underlying model-based and model-free reinforcement learning.
Embedded
yes

Text

The RPE is used to update the state-action value as: QSARSA(s,a)=QSARSA(s,a)+αδRPE where α is a free parameter controlling the SARSA learning rate.