pubmed.ncbi.nlm.nih.gov

Double dissociation of value computations in orbitofrontal and anterior cingulate neurons - PubMed

  • ️Sat Jan 01 2011

Double dissociation of value computations in orbitofrontal and anterior cingulate neurons

Steven W Kennerley et al. Nat Neurosci. 2011.

Abstract

Damage to prefrontal cortex (PFC) impairs decision-making, but the underlying value computations that might cause such impairments remain unclear. Here we report that value computations are doubly dissociable among PFC neurons. Although many PFC neurons encoded chosen value, they used opponent encoding schemes such that averaging the neuronal population extinguished value coding. However, a special population of neurons in anterior cingulate cortex (ACC), but not in orbitofrontal cortex (OFC), multiplexed chosen value across decision parameters using a unified encoding scheme and encoded reward prediction errors. In contrast, neurons in OFC, but not ACC, encoded chosen value relative to the recent history of choice values. Together, these results suggest complementary valuation processes across PFC areas: OFC neurons dynamically evaluate current choices relative to recent choice values, whereas ACC neurons encode choice predictions and prediction errors using a common valuation currency reflecting the integration of multiple decision parameters.

PubMed Disclaimer

Figures

Figure 1
Figure 1

The behavioral task and experimental contingencies. (a) Subjects made choices between pairs of presented pictures. (b) There were six sets of pictures, each associated with a specific outcome. We varied the value of the outcome by manipulating either the amount of reward the subject would receive (payoff), the likelihood of receiving a reward (probability) or the number of times the subject had to press a level to earn the reward (effort). We manipulated one parameter at time, holding the other two fixed. Presented pictures were always adjacent to one another in terms of value, i.e. choices were 1 vs. 2, 2 vs. 3, 3 vs. 4 or 4 vs. 5, hence there were four choice values (i.e., 1–4) per picture set and decision variable. All neuronal analyses were based on the chosen stimulus value (i.e., 1–5). (c) Relationship between the probability of receiving a reward and various value-related parameters. During the choice phase, neurons could encode the value of the choice (the probability of receiving a reward). During the outcome phase, neurons could encode a value signal reflecting the probability of reward delivery, or a prediction error, which is the difference between the subject’s expected value of the choice and the value of the outcome that was actually realized. On rewarded trials these values would always be positive (+PE) whereas on unrewarded trials they would always be negative (−PE).

Figure 2
Figure 2

Three single neuron examples showing choice and outcome activity on probability trials during rewarded and unrewarded trials. For each neuron, the upper row of plots illustrates spike density histograms sorted according to the chosen stimulus value (choice epoch) or the size of the prediction error (outcome epochs). Data is not plotted for the least valuable choice since it was rarely chosen by the subjects. The horizontal grey line indicates the neuron’s baseline firing rate as determined from the 1-s fixation epoch immediately prior to the onset of the pictures. Grey box indicates choice epoch. The blue line in the lower row of plots illustrates how the magnitude of regression coefficients for reward probability coding (from GLMs 1–2) change across the course of the trial. Red data points indicate significant value encoding, while dark blue data points indicate significant prediction error encoding (i.e. encoding reward probability in the opposite direction in the outcome phase relative to the choice phase). (a) An ACC neuron that encoded +PE. It increased firing rate during choice as chosen probability increased and it increased firing rate during rewarded outcomes when the subject was least expecting to receive a reward (i.e., low probability trials). (b) An ACC neuron that encoded −PE. It increased firing rare during choice as chosen probability decreased and it increased firing rate during unrewarded outcomes when the subject was expecting to receive a reward (i.e., high probability trials). (c) An ACC neuron that encoded chosen probability, and at outcome encoded +PE and −PE.

Figure 3
Figure 3

Relationship between neuronal encoding during the choice and outcome epoch. (a) Prevalence of neurons encoding probability during different task events (choice, rewarded outcomes and/or unrewarded outcomes). Asterisks indicate that the proportion in ACC significantly differs from that in LPFC and OFC (chi-squared tests, P < 0.05). More ACC neurons encoded reward probability during both choice and outcome phases (blue shading). (b–c) Scatter plots of the regression coefficients for probability coding during both choice and (b) rewarded outcomes and (c) unrewarded outcomes for all neurons in each brain area. The point of maximum selectivity for each neuron and epoch is plotted, so data is biased away from zero. The different selectivity patterns are indicated by colored symbols. (b) ACC neurons that encoded probability positively during the choice encoded probability negatively during rewarded outcomes (star symbol), consistent with a +PE. Neurons that encoded probability positively during the choice also encoded probability positively during rewarded outcomes, consistent with encoding a value signal at outcome (triangle symbol). (c) There was no consistent relationship between the encoding of probability during the choice and unrewarded outcomes. (d) Bar plots summarizing the different probability encoding schemes, based on the sign of the choice regression coefficient (+ or −). Single black asterisks indicate significant differences between PFC areas (chi-squared tests, P < 0.05); double white asterisks indicate the proportion of neurons with positive or negative regression coefficients was significantly different from the chance 50/50 split (binomial test, P < 0.05). Position of white asterisks indicates the larger population.

Figure 4
Figure 4

Three neurons that encode the value of different choice variables. For each neuron, the upper row of plots illustrates spike density histograms sorted according to the value (2–5) of the expected outcome of the choice. Data is not plotted for the least valuable choice since it was rarely chosen by the subjects. The horizontal grey line indicates the neuron’s baseline firing rate as determined from the 1-s fixation epoch immediately prior to the onset of the pictures. Grey box indicates choice epoch. The blue line in the lower row of plots illustrates how the magnitude of regression coefficients for value coding (from GLM-1) changes across the course of the choice epoch. Red indicates time bins where significant value coding occurs. (a) An ACC neuron that encodes value solely on probability trials with an increase in firing rate as the value of the choice decreases. (b) An ACC neuron that encodes value on probability and payoff trials, increasing its firing rate as value decreases. (c) An ACC neuron that encodes value for all decision variables, increasing its firing rate as value increases.

Figure 5
Figure 5

Population encoding of value during the choice epoch. (a) Prevalence of neurons encoding choice value with a positive or negative regression coefficient. Conventions as in Figure 3d. The mean (solid line) and standard error (shading) of value selectivity as determined by (b) the absolute CPD for each value regressor, (c) the CPD shown separately for neurons encoding value either positively (+) or negatively (−) and (d) the population mean CPD. For the first three columns (probability, payoff, cost) the CPD is calculated from GLM-1 and averaged across all recorded neurons. The final column (all three) is restricted to neurons that encoded all three decision variables and the CPD is calculated from GLM-3. Value consistently explained more variance in ACC than OFC or LPFC (b), but because a similar amount of information was encoded by the populations with positive or negative regression coefficients (c), averaging the CPD across the two populations eliminated the encoding of value information at the population level (d). There were two exceptions: i) in ACC, neurons that encoded all three decision variables primarily did so with a positive regression coefficient and hence carried more information about choice value than the population of neurons with negative regression coefficients and ii) during the outcome phase of probability trials, ACC neurons tended to exhibit a negative coefficient for reward probability.

Figure 6
Figure 6

Population analyses of neuronal activity during the outcome epoch of probability trials. Conventions are the same as Figure 5. (a) The leftmost plot indicates the prevalence of neurons with positive or negative regression coefficients for encoding a categorical signal about reward presence or absence (e.g., neurons with a positive regression coefficient increase firing rate more on rewarded compared to unrewarded outcomes). The middle and right plots indicate the prevalence of neurons with positive or negative regression coefficients for encoding reward probability on rewarded (middle) or unrewarded (right) trials (e.g., neurons with a negative regression coefficient exhibit a linear increase in firing rate as reward probability decreases). (b) The absolute CPD for the three different regressors from GLM-2. ACC encoded more information than LPFC and OFC about the probability of receiving a reward as well as more information about whether or not a reward was actually received. (c) CPD plotted separately for those neurons which exhibited a positive or negative regression coefficient for the three different regressors. The two populations encoded approximately the same amount of outcome information. (d) The population mean CPD, averaged across neurons which exhibit both positive and negative regression coefficients for each of the three different regressors. The mirrored plots from panel (c) average so that approximately zero information about each regressor remains at the population level, with the exception that the ACC population retains information about rewarded outcomes with a negative bias (i.e., spiking on rewarded outcomes increases as probability decreases, akin to a +PE).

Figure 7
Figure 7

Relationship between encoding common value and +PE. (a) Proportion of ACC neurons encoding probability at choice with a positive (+) or negative (−) regression coefficient. ACC neurons that encoded all three decision variables during the choice and/or +PE at outcome were significantly more likely to encode probability during the choice phase with a positive regression coefficient. Double asterisks follow Figure 3d conventions but in black. ACC neurons that encoded +PE at outcome were the same subpopulation of neurons that also encoded all three decision variables at choice (## indicates that 69% of +PE neurons also encode all three decision variables, exceeding the expected frequency based on the odds of encoding payoff and effort information with a positive regression coefficient; P < 0.01 binomial test). (b) For those neurons that encoded probability during the choice, we plotted the correlation of their value encoding on payoff trials with the value encoding on effort trials (GLM-1) separated according to whether they did or did not code either +PE or −PE at outcome. This establishes whether the likelihood of encoding payoff and effort value information is dependent on encoding +PE or −PE. Only neurons that encoded probability during the choice phase and a +PE at outcome showed a significant correlation with value encoding on payoff and effort trials. This suggests that this subpopulation of ACC neurons uses a common value currency for computations related to representing expected values at choice, and discrepancies from those expectations at outcome (+PE).

Figure 8
Figure 8

Neuronal encoding of value history. (a) An OFC neuron encoding current and past trial choice value. This neuron increases firing rate as current choice value increases (leftmost plot). Additionally, when the trials are sorted by the N–1 trial value, firing rate increases both as the current trial value increases and as the previous trial value decreases (four rightmost plots); this neuron is modulated by the difference in the current and past trial value. (b) Dynamics of the encoding shown in (a) as determined from the regression coefficients of GLM-3. Significant bins for current and N–1 value in red and black symbols, respectively. (c–d) An LPFC neuron that encodes current trial value relative to previous trial value. (e) Scatter plot of regression coefficients for current (N) and past trial (N–1) value for all neurons per brain area. Different colored symbols indicate different selectivity patterns. (f) Proportion of neurons encoding current trial N and/or N–1 trial chosen value at the time of the current trial choice, sorted by the sign of the regression coefficient (+ or −) for current trial (left plot) or N–1 trial (middle and right plot) value. Conventions as in Figure 3d. (g–h) Mean correlation (r) between regression coefficients for encoding value of the (g) current and previous trial or (h) current and two trials ago. Brighter colors signify the bins where the correlation was significant (P < 0.01). Symbols indicate significant differences in r values between areas (Fisher’s Z transformation, P<0.01).

Comment in

Similar articles

Cited by

References

    1. Kennerley SW, Walton ME, Behrens TE, Buckley MJ, Rushworth MF. Optimal decision making and the anterior cingulate cortex. Nat Neurosci. 2006;9:940–947. - PubMed
    1. Rudebeck PH, et al. Frontal cortex subregions play distinct roles in choices between actions and stimuli. J Neurosci. 2008;28:13775–13785. - PMC - PubMed
    1. Walton ME, Behrens TE, Buckley MJ, Rudebeck PH, Rushworth MF. Separable learning systems in the macaque brain and the role of orbitofrontal cortex in contingent learning. Neuron. 2010;65:927–939. - PMC - PubMed
    1. Buckley MJ, et al. Dissociable components of rule-guided behavior depend on distinct medial and prefrontal regions. Science. 2009;325:52–58. - PubMed
    1. Noonan MP, et al. Separate value comparison and learning mechanisms in macaque medial and lateral orbitofrontal cortex. Proc Natl Acad Sci U S A. 2010;107:20547–20552. - PMC - PubMed

Publication types

MeSH terms