Chunk #37 — EXPERIMENTAL PROCEDURES — SARSA Learner
Text
The RPE is used to update the state-action value as: QSARSA(s,a)=QSARSA(s,a)+αδRPE where α is a free parameter controlling the SARSA learning rate.
The RPE is used to update the state-action value as: QSARSA(s,a)=QSARSA(s,a)+αδRPE where α is a free parameter controlling the SARSA learning rate.