07-1: A Closer Look at Reinforcement
Psychology of Learning
Module 07: Operant Conditioning 2
Part 1: A Closer Look at Reinforcement
Looking Back
In Module 06, we established operant conditioning fundamentals: Thorndike’s law of effect, Skinner’s three-part contingency, the four consequences of behavior, shaping through successive approximations, and how reinforcement schedules produce distinctive response patterns. Now we delve deeper into reinforcement itself. What makes something a reinforcer? Can any behavior reinforce any other behavior? Must reinforcers be physical objects like food or water, or can activities serve as reinforcers?
The Sea World Challenge: Teaching Complex Behaviors
On a recent trip to Sea World, one author had the pleasure of attending a show that included domesticated animals—cats, dogs, skunks, raccoons, a pig, and pigeons—participating as “actors” in a vaudeville-like production. The show was quite surprising because animals executed series of behaviors on cue: climbing ropes, crawling through holes, flying from one area to another (pigeons only, of course), and knocking over various items at just the right time. Throughout the production, audience members repeatedly asked, “How do they get them to do that?”
If you identified shaping as a possible technique, you were on the right track. Although the Sea World show included animals demonstrating basic behaviors within their normal repertoires, the precise timing and sequencing required more sophisticated applications of operant conditioning principles. Behavior modification—treatment approaches counselors use based on operant conditioning principles—can help individuals replace undesirable behaviors with more desirable behaviors. But to apply these techniques effectively, we must first understand what makes reinforcement work.
Is Reinforcement Necessary for Learning?
Latent learning is hidden learning that occurs on trials when no reinforcement is delivered but can be seen in the subject’s behavior on trials when reinforcement is introduced. Tolman and Honzik (1930) demonstrated this with rats learning mazes.
Tolman and Honzik’s (1930) classic experiment used three groups of rats in mazes. Group 1 received food reinforcement every day—their error rates decreased steadily across trials as expected. Group 2 never received reinforcement—their error rates remained high throughout. Group 3 received no reinforcement for 10 days, then received food starting on day 11. What happened?
Group 3’s performance on day 11—their first reinforced trial—matched Group 1’s performance! They had been learning the maze all along despite no reinforcement. This learning remained latent (hidden) until reinforcement motivated its expression. The question becomes: Is reinforcement necessary for learning new voluntary behaviors?
Answer: Reinforcement is not needed for learning new responses, but reinforcement is needed for demonstrating new responses. Learning and performance are distinct. Organisms can acquire knowledge without reinforcement, but displaying that knowledge requires motivation—often provided by reinforcement.
The Premack Principle: Activities as Reinforcers
The basic assumption of operant conditioning rests upon Thorndike’s law of effect: Behaviors followed by favorable consequences (reinforcement) are more likely to be repeated. David Premack (1959) refined this assumption by proposing that higher probability behaviors (those demonstrated more frequently) will reinforce lower probability behaviors. Despite the symbol SR suggesting a stimulus, Premack showed that “reinforcement” doesn’t have to be an environmental stimulus—activities themselves can be reinforcers.
The Premack Principle is Premack’s idea that a more preferred behavior or activity can serve as a reinforcer for a less preferred behavior or activity. We can use behaviors we enjoy to reinforce behaviors we don’t enjoy as much (Premack, 1959).
We have hierarchies of behaviors based on preference. Preferences differ by individual. For the Premack Principle to apply, one must know which behavior is preferred more strongly. A less preferred behavior does not reinforce a more desirable behavior—the hierarchy matters.
Premack’s Pinball and Candy Study
The Premack Principle was based on Premack’s (1959) observations of children’s behaviors when they had free access to a candy dispenser and pinball machine. He was interested in which behaviors were preferred—playing pinball or eating candy. Premack found that some children preferred pinball to candy, whereas others demonstrated the reverse preference. Individual differences in preferences are crucial—what reinforces one person may not reinforce another.
The Playground Example
If you think back to your grade school years, you may recall a teacher telling the class that no one would be allowed to go to the playground until everyone’s desk was clean. Your teacher knew the class preferred the playground to cleaning desks. Thus, the teacher reinforced desk-cleaning behavior with the promise of going to the playground.
In this scenario, going to the playground is the more preferred behavior and therefore serves as reinforcer for cleaning desks. Few students would prefer spending an hour cleaning desks to an hour of recreation on the playground. Thus, promise of playground time serves as motivation to achieve clean desks. Notice both R and SR are behaviors:
SD: Promise by teacher → R: Cleaning desks → SR: Playground time
Clinical Application: Food Refusal
A 7-year-old child with learning difficulties had a history of food refusal; his health was beginning to suffer. Parents presented foods for the boy to eat that were similar to his preferred foods. After eating the new food, he was allowed to eat his previously preferred foods. For example, the boy was presented with a bread roll (new food) first. After eating the roll, he could eat a bread slice (his preferred food). Each time they presented a new food, the boy was told he could have his preferred food after eating the new food.
Over time, the boy began eating a wider variety of foods and overall greater quantity. The Premack Principle served as an effective premise for training the child to become healthier by expanding his diet (Brown, Spencer, and Swift, 2002). This demonstrates practical therapeutic applications of theoretical principles.
Reinforcement relativity is the concept that there are no absolute categories of reinforcers and reinforceable responses. What serves as a reinforcer depends on the relative preferences of the individual and the context (Premack, 1959).
Premack Principle and Punishment
Premack’s theory extends to punishment: Less probable behaviors will punish more probable behaviors. In baseline, a rat spent about 17% of its time drinking and 10% of its time running. High probability drinking served as reinforcement for low probability running. But when contingencies reversed—making drinking contingent on running—low probability running punished high probability drinking. The rat drank less to avoid having to run more.
Response Deprivation Theory: Deprivation Creates Reinforcement
Response deprivation theory is the theory that depriving an individual or animal the opportunity to engage in a behavior below that behavior’s usual baseline can cause the behavior to become a reinforcer for another targeted behavior. When access to preferred behavior is restricted, animals do much to resume usual activity levels (Timberlake and Allison, 1974).
The discrepancy between baseline rate of preferred behavior and current opportunity to perform behavior determines reinforcement level. Greater discrepancy is associated with stronger reinforcement. This theory makes a surprising prediction: Even low-probability behaviors can become reinforcers if sufficiently restricted.
Rat Running and Drinking Study
In baseline conditions, a rat spent about 17% of its time drinking and 10% of its time running—a ratio of 1.7 to 1. This rat prefers this ratio above all other possible ratios. Experimenters then imposed contingencies that disrupted this preferred ratio.
First contingency: The rat was required to show 5 seconds of drinking and 15 seconds of running, making a ratio of 1 to 3—a large decrease from baseline ratio. What happened? The animal struck a compromise, increasing running time somewhat to raise drinking time closer to baseline levels. Running, normally less preferred, became valuable because drinking was restricted below baseline.
Second contingency: The rat was required to show 45 seconds of drinking and 5 seconds of running, making a ratio of 9 to 1—a large increase from baseline ratio. Now running became a precious commodity. Drinking occurred primarily to be allowed at least some running. Even though running was less preferred at baseline, deprivation made it highly reinforcing.
Premack Versus Response Deprivation Theory
You may have noticed one primary difference between the Premack Principle and response deprivation theory. The Premack Principle states that higher probability behavior can serve as reinforcer when contingent upon demonstration of some less preferred behavior. Response deprivation theory argues that the contingent behavior will serve as reinforcer only when the baseline amount of that behavior is restricted—the person or animal is deprived of that behavior. Even a less probable behavior can serve as reinforcer if ability to engage in that behavior is reduced from its normal baseline.
High-Protein Diet Example
Consider a person never been tempted by desserts, but tempted by a big, juicy steak. This person goes on a high-protein diet in which foods high in starch and sugar (desserts) are greatly restricted and foods high in protein (juicy steaks) may be consumed freely.
According to Premack, the opportunity to splurge and eat a dessert—normally a less preferred activity to eating a steak—would not serve as reinforcer for staying on the diet. Desserts are lower in the preference hierarchy.
Response deprivation theory predicts eating a dessert could become reinforcer if access to eating desserts were reduced below usual baseline levels. The dieter may begin to miss desserts even though they were only eaten infrequently before the diet. Restriction below baseline—not just preference—determines reinforcing value.
Need Reduction Theory and Drive Reduction Theory
Need reduction theory proposes that all primary reinforcers are stimuli that reduce some biological need, and all stimuli that reduce biological need will act as reinforcers. This makes evolutionary sense—things we need to survive are things that are reinforcing and therefore most likely to be pursued (Hull, 1943).
However, problems emerged. Why is saccharin reinforcing? It reduces no biological need yet strongly reinforces behavior. Rats with thiamine deficiency will not immediately select thiamine-rich food but settle on it over time, suggesting that health, not immediate need reduction, is the reinforcer. Need reduction theory couldn’t account for all reinforcers.
Drive reduction theory proposes that any decrease in a biological drive (hunger, thirst, sex, etc.) will serve as reinforcer. Strong stimulation of any sort is aversive and creates a drive; reduction of that drive will be reinforcing. These drives can be primary (hunger) or secondary (excessive noise) (Hull and Miller, 1948).
But problems exist here too. Reduction of room temperature from 100° to 75° is reinforcing; reduction from 25° to 0° is not. More problematic: pornography (or access to a female rat in heat) creates an increase in sexual arousal, yet is rewarding all on its own. Drive increase, not reduction, provides reinforcement. Drive reduction theory couldn’t explain all reinforcing phenomena.
Electrical Stimulation of the Brain (ESB)
Electrical stimulation of the brain (ESB) is a mild, pulsating electrical current which, when delivered to certain parts of the brain, acts as a powerful reinforcer. Rats will cross electrified floors to reach levers controlling stimulation of pleasure centers, but might die of starvation if the lever led to food instead (Olds and Milner, 1954).
Normally, the hypothalamus rewards behaviors essential to survival. Using ESB, a rat can be remotely controlled to navigate environments in any way the controller sees fit. A theory of addictions proposes that people become addicted to drugs, alcohol, or gambling because of reward deficiency syndrome—a deficiency in the brain’s natural rewarding properties. ESB research revealed that reinforcement involves specific brain circuits that can be directly stimulated, bypassing need or drive reduction entirely.
Looking Forward
We’ve explored sophisticated theories of reinforcement. Latent learning demonstrated that learning occurs without reinforcement but requires reinforcement for performance. The Premack Principle revealed that activities can reinforce other activities based on preference hierarchies. Response deprivation theory showed that even low-probability behaviors can become reinforcers if restricted below baseline levels. In Part 2, we’ll explore behavioral economics—how principles from psychology and economics combine to predict choices and behaviors—along with forward and backward chaining for teaching complex behavior sequences, instinctive drift, and avoidance learning.