Chunk #41 — EXPERIMENTAL PROCEDURES — HYBRID Learner
Text
Q values for the HYBRID learner are then computed as a weighted sum of the estimates from the two other learners, on trial t: QHYB(s,a)=wt×QFWD(s,a)+(1−wt)×QSARSA(s,a)
Q values for the HYBRID learner are then computed as a weighted sum of the estimates from the two other learners, on trial t: QHYB(s,a)=wt×QFWD(s,a)+(1−wt)×QSARSA(s,a)