Chunk #42 — EXPERIMENTAL PROCEDURES — Action Selection

Source: States versus rewards: dissociable neural prediction error signals underlying model-based and model-free reinforcement learning.
Embedded: yes

Text

Each of the models additionally assumes that participants select actions stochastically according to probabilities determined by their state-action values through a softmax distribution: P(s,a)=exp(τ×Q(s,a))∑b=1nexp(τ×Q(s,b)) where Q is QSARSA, QFWD, or QHYB, depending on the model, and the free “inverse temperature” parameter τ controls how focused the choices are on the highest valued action.