pubmed.ncbi.nlm.nih.gov

Discounting of Future Rewards and Punishments in Rats - PubMed

  • ️Sat Jan 01 2022

Discounting of Future Rewards and Punishments in Rats

Maurice-Philipp Zech et al. eNeuro. 2022.

Abstract

Temporal reward discounting describes the decrease of value of a reward as a function of delay. Decision-making between future aversive outcomes is much less studied, and there is no clear decision pattern across studies; while some authors suggest that human and nonhuman animals prefer sooner over later painful shocks, others found the exact opposite. In a series of three experiments, Long-Evans rats chose between differently timed electric shocks and rewards in a T-maze. In experiment 1, rats chose between early and late painful shocks with identical, long reward delays; in experiment 2, they chose between early reward and early shocks, or late rewards and late shocks; in experiment 3, they chose between early and late rewards, with identical, short delays to the shock. We tested the predictions of two competing hypotheses: the aversive discounting theory assumes that future shocks are discounted, and, hence, less unpleasant than early shocks. The utility from anticipation theory implies that rats derive negative utility from waiting for the shock; late shocks should, hence, be more unpleasant than early shocks. We did not find unanimous evidence for either theory. Instead, our results are more consistent with the post hoc idea that shocks may have negative spill-over effects on reward values, the closer in time a shock is to a subsequent reward, the stronger the reward is devalued. Interestingly and consistent with our theory, we find that, depending on the temporal shock-reward contiguity, rats can be brought to prefer later over sooner rewards of identical magnitudes.

Keywords: aversive discounting; delay discounting; intertemporal choice; reward; shock; utility from anticipation.

Copyright © 2022 Zech et al.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing financial interests.

Figures

Figure 1.
Figure 1.

Overview of the customized T-maze. The start arm starts on the right side with the start box, which can be closed by an automatic door. Two identical decision arms were connected to the start arm. Additionally, the decision arms could be closed by automatic down sliding doors. Pellet dispensers were placed at the end of the start arm and each decision arm. Finally, the pellets were delivered into Petri dishes, and above the Petri dishes, reward lights were placed.

Figure 2.
Figure 2.

Overview of the shock and reward timings for each experiment. After an animal entered a decision arm (0 s), the doors were closed. Afterwards, the rewards and shocks were delivered. In general, an early reward had a delay of 2 s and a late reward of 21 s. The early shock was administered after 1 s (relative to entering) and the late shock after 20 s.

Figure 3.
Figure 3.

Overview of the shock and reward contingencies in the two arms of the T-maze for each experiment and the predictions for each theory. For the shocks and rewards, each column represents one arm of the T-maze (early reward + late reward [EE]; late shock + late reward [LL]; early shock + early reward [EE]). In the first experiment, the entry-to-reward contingencies are identical in both arms, but the entry-to-shock delays differ between both arms. The second experiment tested constant shock-to-reward delays with different entry-to-shock delays. The third experiment used constant entry-to-shock delays, but the shock-to-reward contingencies differ between arms. In the predictions, “∼” represents no predictions. Note that the assignment of shock/reward contingencies to the left or right arm of the T-maze will be pseudo-randomized across sessions for all experiments. The left side is always the condition for which the percentage of decisions was calculated.

Figure 4.
Figure 4.

Depicted is the used Bayesian hierarchy model. As the likelihood function for the decisions, a Bernoulli distribution was used. Each estimation of

θ

followed a β function with

μθ(κ−2)+1,(1−μθ)(κ−2)+1

. For

μθ

, an uninformed β prior was used, and

κ

used a

gamma(0.01,0.01)

function (for prior predictions, see Extended Data Figs. 4-1, 4-2, 4-3).

Figure 5.
Figure 5.

Mean parameter estimation with simulated data for all predictions according to the Bayesian hierarchy model. The predictions of both models are color coded, and on the top left corner are the specific experiments (EL: early shock + late rerwad; LL: late shock + late reward; EE: early shock + early reward). The y-axis shows the posterior distribution of

μθ

. The vertical gray lines represent the upper and lower bound for the 95% highest density interval. Experiment 1 is in the top row, and the utility from anticipation model is on the left side (upper bound = 0.42, lower bound = 0.38). On the other side is the prediction for the aversive discounting model (upper bound = 0.62, lower bound = 0.589). In the second row is experiment 2, and the prediction for the utility from anticipation model is displayed (upper bound = 0.42, lower bound = 0.38). Finally, in the bottom row, experiment 3 can be seen, and both models have the same predictions (upper bound = 0.42, lower bound = 0.38).

Figure 6.
Figure 6.

A, Mean percentage of decisions for each experiment. In each experiment, animals performed 10 sessions and up to 21 trials per session (6 forced trials and 16 free trials). The timings of the reward and shocks are displayed on the x-axis as well as the experiment. Specifically, for experiment 1, the percentage of decisions are calculated for early shock + late reward (EL) versus late shock + late reward (LL). For experiments 2 and 3, the percentage is calculated for early shock + early reward (EE) versus late shock + late reward (LL) and early shock + early reward (EE) versus early shock + early shock + late reward (EL), respectively. The vertical lines represent the SEM, and each dot represents a single animal. For all experiments, we calculated one-sample t tests (two-tailed). In the first experiment, animals showed a significant preference above chance level. The second experiment failed to yield any significant results; and in the third experiment, animals revealed a significant preference below chance level. The black horizontal line represents chance level. B, Mean parameter estimation for the Bayesian hierarchy model. On the top left corner are the specific experiments. On the y-axis is the posterior distribution of

μθ

. Additionally, the conditions are displayed again. The vertical gray lines show the upper and lower bound for the 95% highest density interval. From top to bottom is experiment 1 (upper bound = 0.57, lower bound = 0.52), experiment 2 (upper bound = 0,55, lower bound = 0.49), and experiment 3 (upper bound = 0.48, lower bound = 0.42). See the extended data for the posterior θ distribution for all experiments (Extended Data Fig. 6-1); ***p < 0.00.

Figure 7.
Figure 7.

Mean percentage choice (experiment 1: early shock, late reward vs late shock, late reward, EL vs LL; experiment 2: early shock, early reward vs late shock, late reward, EE vs LL; experiment 3: early shock, early reward vs early shock, late reward, EE vs EL) for the first and second blocks of trials (trials 1–8 vs trial 9–16; left panels) and for the first and second block of sessions (sessions 1–5 vs sessions 6–10; right panels) in all experiments. Repeated-measured ANOVAs revealed significant main effects of block of trials for experiments 2 and 3. Additionally, there was a significant main effect of block of sessions on percentage choice in experiment 3, but not in experiment 1 or 2. Additionally, there were significant main effects of session order within a block on choice in experiments 1 and 2 but not in 3; *p < 0.05.

Similar articles

References

    1. Benzion U, Rapoport A, Yagil J (1989) Discount rates inferred from decisions: an experimental study. Manag Sci 35:270–284. 10.1287/mnsc.35.3.270 - DOI
    1. Berns GS, Chappelow J, Cekic M, Zink CF, Pagnoni G, Martin-Skurski ME (2006) Neurobiological substrates of dread. Science 312:754–758. 10.1126/science.1123721 - DOI - PMC - PubMed
    1. Berridge KC, Robinson TE (1998) What is the role of dopamine in reward: hedonic impact, reward learning, or incentive salience? Brain Res Brain Res Rev 28:309–369. 10.1016/s0165-0173(98)00019-8 - DOI - PubMed
    1. Church RM (2003) A concise introduction to scalar timing theory. In: Functional and neural mechanisms of interval timing, pp 3–22. Boca Raton: CRC/Routledge/Taylor and Francis Group.
    1. Deluty MZ (1978) Self-control and impulsiveness involving aversive events. J Exp Psychol Anim Behav Proces 4:250–266. 10.1037/0097-7403.4.3.250 - DOI

MeSH terms

LinkOut - more resources