Chunk #43 — EXPERIMENTAL PROCEDURES — Action Selection

Source: States versus rewards: dissociable neural prediction error signals underlying model-based and model-free reinforcement learning.
Embedded: yes

Text

We fit each model’s free parameters to the behavioral data, by minimizing the negative log-likelihood – Σlog(P(s,a)) of the obtained choices a given the previously observed choices and rewards, summed over all subjects and trials. The HYBRID learner has 5 free model parameters (α, η, τ, l, and k); the SARSA and FORWARD learners each have 2 (α or η, and τ). We estimated a single set of parameters for all participants because the unregularized maximum likelihood estimators tend to be very noisy in individual subjects leading to very different and sometimes even outlying parameter estimates. In addition, the resulting regressors for this kind of “model-based fMRI” data analysis tend to perform poorly. A single set of parameters as frequently employed in our recent work (Daw et al., 2006; Gershman et al., 2009; Glascher et al., 2009) imposes a simple, but efficient regularization, which stabilizes the estimated model parameters. Goodness of fit was compared between models taking into account the different numbers of free parameters using likelihood ratio tests and Akaike’s Information Criterion (AIC).