We quantified how transition, outcome, and their interaction predicted stay probability in the present task (Figure 2A) using a logistic regression analysis (Figure 2B), with additional predictors to capture choice biases and correct for cross-trial correlations which can otherwise can give a misleading picture of how trial events influence subsequent choice (Akam et al., 2015; Table 2). Positive loading on the outcome predictor indicated that reward was reinforcing (i.e., predicted staying) (p < 0.001, bootstrap test). Positive loading on the transition predictor indicated that common transitions were also reinforcing (p < 0.001), as expected for model-based control with transition probability learning. Loading on the transition-outcome interaction predictor was not significantly different from zero (p = 0.79). To understand the implications of this, we simulated the behavior of a model-based and a model-free RL agent, with the parameters of both fit to the behavioral data, and ran the logistic regression analysis on data simulated from both models (Figures 2D–2I). The RL agents used in these simulations included forgetting about actions not taken and states not visited, as RL model comparison indicated