In the canonical interpretation of dopaminergic neuron activity during Pavlovian conditioning, initially cell firing is triggered by unexpected rewards. Upon learning, activation instead follows the reward-predictive conditioned stimulus, and when expected rewards are withheld, firing is inhibited. However, little is known about dopaminergic neuron activity during the actual learning process in operant tasks. Here, we recorded optogenetically identified dopaminergic neurons of ventral tegmental area (VTA) in mice training in multiple, successive operant sensory discrimination tasks. A delay between nose-poke choices and trial outcome signals (for reward or punishment) probed for instructional or predictive activity. During training, but prior to criterion performance, firing increased after correct, but not incorrect choices, but prior to outcome signals. Thus, the neurons predicted whether choices would be rewarded, despite the animals' subthreshold behavioral performance. Surprisingly, these neurons also fired after reward delivery, as if the rewards had been unexpected according to the canonical view, but activity was inhibited after punishment signals, as if the reward had been expected after all. These inconsistencies suggest revision of theoretical formulations of dopaminergic neuronal activity to embody multiple roles in temporal difference learning and actor-critic models. Furthermore, on training trials when these neurons predicted that a given choice was correct and would be rewarded, surprisingly, the mice adhered to other non-rewarded and untrained task strategies (e.g., alternation). The DA neurons' reward prediction activity could serve as critic signals for the choices just made. This consistent with the notion that the brain must reconcile multiple Bayesian belief representations during learning.