Chunk #41 — EXPERIMENTAL PROCEDURES — HYBRID Learner

Source: States versus rewards: dissociable neural prediction error signals underlying model-based and model-free reinforcement learning.
Embedded: yes

Text

Q values for the HYBRID learner are then computed as a weighted sum of the estimates from the two other learners, on trial t: QHYB(s,a)=wt×QFWD(s,a)+(1−wt)×QSARSA(s,a)