Each block was defined by the state of both the reward and transition probabilities (Figure 1C). There were three possible states of the reward probabilities for the left/right ports: respectively good/bad, neutral/neutral, and bad/good, where good/neutral/bad reward probabilities were 0.8/0.4/0.2. There were two possible states of the transition probabilities: top → left/bottom → right and top → right/bottom → left (Figure 1C), where, for example, top → right indicates that the top port commonly (0.8 of trials) led to the right port and rarely (0.2 of trials) to the left port. At block transitions, the reward and/or transition probabilities changed (see STAR Methods). Reversals in which first-step action (top or bottom) had higher reward probability could therefore occur because of reversals in either the reward or transition probabilities. Block transitions were triggered on the basis of a behavioral criterion (see STAR Methods) that resulted in block lengths of 63.6 ± 31.7 (mean ± SD) trials.