First-step model-free action values were updated as:1Qmf(c)←(1−αQ)Qmf(c)+αQ(λr+(1−λ)V(s))This combines an update due to the value V(s) of the second-step state reached, with direct update of the first-step action value by the trial outcome due to eligibility traces. The relative influence of each is controlled by the eligibility trace parameter λ.