Play all audios:
The nervous system is hypothesized to compute reward prediction errors (RPEs) to promote adaptive behavior. Correlates of RPEs have been observed in the midbrain dopamine system, but the
extent to which RPE signals exist in other reward-processing regions is less well understood. In the present study, we quantified outcome history-based RPE signals in the ventral pallidum
(VP), a basal ganglia region functionally linked to reward-seeking behavior. We trained rats to respond to reward-predicting cues, and we fit computational models to predict the firing rates
of individual neurons at the time of reward delivery. We found that a subset of VP neurons encoded RPEs and did so more robustly than the nucleus accumbens, an input to the VP. VP RPEs
predicted changes in task engagement, and optogenetic manipulation of the VP during reward delivery bidirectionally altered rats’ subsequent reward-seeking behavior. Our data suggest a
pivotal role for the VP in computing teaching signals that influence adaptive reward seeking.
The data generated and analyzed for this manuscript are available publicly at https://doi.org/10.12751/g-node.3lbd0c and ref. 51.
The code used to analyze and visualize the data in this manuscript are available as Supplementary software and online at https://doi.org/10.12751/g-node.3lbd0c and ref. 51.
This work was supported by the National Institutes of Health (grant nos. 5T32NS91018-17 (to D.J.O.), F30MH110084 (to B.A.B.), K99AA025384 (to J.M.R.), R01DA042038 and R01NS104834 (to
J.Y.C.), and R01DA035943 (to P.H.J.)), by Klingenstein-Simons, MQ, NARSAD, and Whitehall (to J.Y.C.), by a NARSAD Young Investigator Award (to J.M.R.) and by the National Science Foundation
Graduate Research Fellowship (grant no. DGE1746891 to D.J.O.). We thank K. Wang and X. Tong for technical assistance.
These authors contributed equally: David J. Ottenheimer, Bilal A. Bari.
Solomon H. Snyder Department of Neuroscience, Johns Hopkins University, Baltimore, MD, USA
David J. Ottenheimer, Bilal A. Bari, Elissa Sutlief, Jeremiah Y. Cohen & Patricia H. Janak
Brain Science Institute, Johns Hopkins University, Baltimore, MD, USA
Department of Psychological and Brain Sciences, Johns Hopkins University, Baltimore, MD, USA
Kurt M. Fraser, Tabitha H. Kim, Jocelyn M. Richard & Patricia H. Janak
Department of Neuroscience, University of Minnesota, Minneapolis, MN, USA
Kavli Neuroscience Discovery Institute, Johns Hopkins University, Baltimore, MD, USA
D.J.O., J.M.R. and P.H.J. designed the experiments. D.J.O. collected the electrophysiology data. D.J.O., K.M.F. and T.H.K. collected the optogenetic data. B.A.B. designed and fit the models
in consultation with D.J.O. D.J.O., B.A.B. and E.S. analyzed and visualized the data. D.J.O., B.A.B., J.M.R., J.Y.C. and P.H.J. interpreted the data. D.J.O., B.A.B. and P.H.J. prepared the
manuscript with comments from E.S., K.M.F., T.H.K., J.M.R. and J.Y.C.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Recording locations for nucleus accumbens (left) and ventral pallidum (right) rats.
(a) Distribution of the learning rate, α, for RPE neurons in VP (green) and NAc (orange). (b) Likelihood (LH) per trial for RPE and Current outcome neurons (n = 72 RPE and 126 Current
outcome neurons from 5 rats) for RPE and Current outcome models, relative to the LH per trial of the Unmodulated model. Lower (more negative) indicates a better fit. Line represents median,
box represents 25th and 75th percentile, and whiskers extend to 1.5 times the interquartile range. Red highlights the AIC-selected model. Median [25th to 75th percentile; min to max]
∆LH/trial are: RPE neurons, RPE model -0.21 [-0.39 to −0.14; −3.16 to −0.05], RPE neurons, Current outcome model −0.15 [−0.32 to −0.09; −3.03 to −0.02], Current outcome neurons, RPE model
-0.12 [-0.23 to -0.07; -0.174 to -0.03], Current outcome neurons, Current outcome model -0.12 [-0.22 to -0.07; -1.73 to -0.03]. Median [25th-75th percentile] LH per trial for RPE neurons was
2.29 [2.04 to 2.49] and for Current outcome neurons was 2.15 [1.92 to 2.37]. (c) Model recovery, plotted as the fraction of neurons simulated with each model recovered as that model. (d)
Distribution of difference between the true value of the parameters used to simulate the neurons in (c) and the values recovered by MLE.
(a) Expression of ArchT3.0:YFP and fiber tip placement for the rats included in the ArchT3.0 group for the optogenetic experiment in Fig. 3. (b) Expression of ChR2:GFP and fiber tip
placement for the rats included in the ChR2 group. Pattern of results remained unchanged with or without inclusion of the rat with the most caudal placement.
(a) Mean(+/−SEM) port occupancy in time surrounding reward delivery on laser and no laser trials for YFP (left, n = 7 rats) and ArchT (right, n = 7 rats) groups. (b) Mean(+/−SEM) port
occupancy in time surrounding reward delivery on laser and no laser trials for GFP (left, n = 7 rats) and ChR2 (right, n = 11 rats) groups. To account for the disruption of port occupancy by
laser stimulation, we ran our distance from port analysis on the time beyond 15 s past reward delivery and found the same pattern of results. (c) Additional optogenetic experiment in ChR2
rats and controls where the 2 sec of laser stimulation was at the onset of the cue. (d) Mean(+/−SEM) distance from port in the ITI following laser stimulation did not differ from no laser
trials for GFP (p = 0.94, Wilcoxon signed-rank test, two-sided, n = 7 rats) or ChR2 (p = 0.11, Wilcoxon signed-rank test, two-sided, n = 10 rats) groups. (e) The effect of laser was similar
across both groups (median: 0.06 GFP, n = 7 rats; -0.09 ChR2, n = 10 rats; p = 0.36, Wilcoxon rank-sum test, two-sided).
(a) Schematic of model-fitting and neuron classification process. For each neuron, the reward outcome and spike count following reward delivery on each trial were used to fit two models:
Value and Unmodulated. Akaike information criterion (AIC) was used to select the best model (right). (b) Mean(+/−SEM) activity of neurons best fit by each of the models, plotted according to
previous outcome (n = 39 Value and 397 Unmodulated neurons from 5 rats). (c) Coefficients(+/−SE) for outcome history linear regression for each class of neurons (n = 39 Value and 397
Unmodulated neurons). (d) Mean(+/−SEM) activity of all Value neurons with trials binned by model-derived Value. (e) Mean(+/−SEM) population activity of simulated and actual Value neurons
according to each trial’s Value (V). (f) Model recovery, plotted as the fraction of neurons simulated with each model recovered as that model.
(a) Fraction of VP neurons best fit by the Value and Unmodulated models in the random sucrose/maltodextrin/water task. (b) Mean(+/−SEM) activity of neurons best fit by each of the models,
plotted according to previous outcome (n = 38 Value and 216 Unmodulated neurons from 3 rats). (c) Coefficients(+/−SE) for outcome history linear regression for each class of neurons (n = 38
Value and 216 Unmodulated neurons). (d) Mean(+/−SEM) population activity of simulated and actual Value neurons according to each trial’s Value (V). (e) Mean(+/−SEM) activity of all Value
neurons with trials binned by model-derived Value. (f) Distribution of correlations between individual VP neurons’ firing rates at cue onset on each trial and the distance from the port
during the previous ITI. * = p = 0.00001 for negative shift in mean correlation coefficient (vertical line) compared to 1000 shuffles of data for Value neurons, Wilcoxon signed-rank test,
two-sided, as well as p = 0.0000002 for more negative coefficients for Value neurons compared to Unmodulated neurons, Wilcoxon rank-sum test, two-sided. See also Fig. 4c,d.
Recording locations for rats from predictable and random sucrose/maltodextrin experiment in Extended Data Fig. 8.
(a) Task schematic: three auditory cues indicated three trial types. (b) Median latency to enter reward port following onset of cue for each trial type, plotted as the mean(+/−SEM) across
all sessions for each rat (gray lines, n = 8, 9, 10, and 10 sessions for the 4 rats) and the overall mean(+/−SEM) (n = 37 sessions). (c) Percentage sucrose of total solution consumption in a
two-bottle choice, before (‘Initial’) and after (‘Final’) recording (n = 4 rats). (d) Mean(+/−SEM) lick rate relative to reward delivery for each trial type (n = 37 sessions from 4 rats).
(e) Mean(+/−SEM) activity of all neurons recorded in the predictable and random sucrose/maltodextrin task, aligned to reward delivery (n = 487 neurons from 4 rats). (f) Schematic of cue
model-fitting. The best model (of 6 total) was selected with Akaike information criterion. (g) Fraction of the population best fit by each model. (h) Coefficients(+/−SE) for outcome history
regression for each class of neurons with no cue effect (n = 38 RPE, 135 Current outcome, and 204 Unmodulated neurons). (i) Mean(+/−SEM) activity of all RPE neurons with no cue effect (n =
38 neurons). The trials for each neuron are binned according to their model-derived RPE. (j) Population activity of simulated and actual VP RPE neurons with no cue effect according to each
trial’s RPE value. (k) Scatterplot of each cue effect neuron’s weight for specific sucrose and maltodextrin cues (n = 7 RPE, 33 Current outcome, and 70 Unmodulated cells with cue effects).
The percentage of neurons falling in each quadrant is indicated. The percentage in our quadrant of interest (positive value for sucrose and negative value for maltodextrin) did not differ
from chance (p = 0.1 for exact binomial test compared to null of 25%). (l) Mean(+/−SEM) activity of neurons with sucrose values > 0 and maltodextrin values < 0, consistent with a value-based
cued expectation modulation. (m) Neurons with cue effects for cue-evoked signaling, rather than reward-evoked signaling, as in (g). (n) As in (k), for activity at the time of the cue rather
than time of reward (n = 143 neurons with cue effects). * = p = 0.00001 for exact binomial test compared to null of 25%. (o) As in (l), for activity at the time of the cue rather than time
of reward.
(a) Fraction of neurons classified as RPE, Current outcome, and Unmodulated in VP and NAc in the random sucrose/maltodextrin task using Bayesian information criterion (BIC) as the selection
criterion. (b) Coefficients(+/−SE) for outcome history regression for VP neurons of each BIC subset (n = 37 RPE, 110 Current outcome, and 289 Unmodulated cells from 5 rats). (c) Population
mean(+/−SEM) of all VP BIC RPE neurons, binned according to the model-derived RPE. (d) Mean(+/−SEM) population activity of simulated and actual BIC RPE neurons according to each trial’s RPE
value for VP (left) and NAc (right). (e) Distribution of correlations between model-predicted and actual spiking for all RPE neurons from each region. (f) Distribution of α for RPE neurons
in VP (green) and NAc (orange). (g) Mean(+/−SEM) activity of VP neurons classified as RPE by AIC but not BIC according to current and previous outcome (n = 35 neurons). (h)
Coefficients(+/−SE) for outcome history regression for these neurons. (i) Mean(+/−SEM) activity of these neurons binned according to model-derived RPE on each trial.
Anyone you share the following link with will be able to read this content: