"

06-2: Four Consequences of Behavior & Shaping

Psychology of Learning

Module 06: Operant Conditioning 1

Part 2: Four Consequences of Behavior and Shaping

Looking Back

In Part 1, we established the foundations of operant conditioning, distinguishing voluntary behaviors shaped by consequences from reflexive behaviors elicited by stimuli. We explored three pioneering theorists: Thorndike, who demonstrated that behaviors followed by satisfying consequences are repeated (law of effect); Guthrie, who proposed one-trial learning of individual movements while practice builds skills through varying conditions; and Skinner, who synthesized these ideas into the three-part contingency (SD → R → SR). We saw how discriminative stimuli signal when behaviors will produce consequences, using examples from alarm clocks to goldfish driving cars. Now we delve deeper into the heart of operant conditioning: the consequences that follow behavior. The SR in the three-part contingency isn’t a single type of event—it encompasses four distinct types of consequences, each producing specific, predictable effects on behavior. Understanding these four consequences is essential for applying operant conditioning principles effectively in education, therapy, parenting, workplace management, and self-change efforts.

Four Consequences of Behavior

In operant conditioning, learning occurs when an organism forms an association between demonstrating a behavior and experiencing a particular consequence. Different consequences influence behavior in different ways. The four basic types of consequences are positive reinforcement, positive punishment, negative reinforcement, and negative punishment (also called omission training). These consequences are intended either to increase or decrease a target behavior.

The terms “positive” and “negative” don’t mean “good” and “bad.” Rather, they refer to whether something is added (positive) or removed (negative). “Reinforcement” increases behavior frequency; “punishment” decreases it. This creates a 2×2 matrix: Adding something pleasant produces positive reinforcement, which increases behavior. Removing something unpleasant produces negative reinforcement, which also increases behavior. Adding something unpleasant produces positive punishment, which decreases behavior. Removing something pleasant produces negative punishment, which also decreases behavior.

Positive Reinforcement: Adding Something Pleasant

Positive reinforcement is the application of a pleasant stimulus or consequence, intended to increase the likelihood that the animal or individual will repeat a targeted behavior in the future. Something desirable is added following the behavior (Skinner, 1938).

Example: Teaching “Thank You.” A parent wants to encourage their two-year-old to say “Thank you” when given a gift. A guest at the child’s birthday party presents the child with a gift. The child may not realize the parent wants them to say “Thank you” politely. The parent says, “What do you say when someone gives you a gift?” This provides the cue (SD) to say “Thank you” (R). Both the guest and parent smile, and the guest responds, “You are very welcome! How polite you are!” (SR). Social approval—smiles, praise, warmth—reinforces polite behavior, making it more likely to recur.

Positive reinforcement pervades daily life: paychecks for working, good grades for studying, compliments for helpful behavior, likes for social media posts, pleasure from eating favorite foods. Each adds something desirable following behavior, increasing that behavior’s future frequency.

Positive Punishment: Adding Something Unpleasant

Positive punishment is the application of a negative stimulus or consequence, intended to decrease future demonstrations of a targeted behavior. Something aversive is added following the behavior (Skinner, 1938).

Example: Speeding Tickets. In theory, a driver who speeds receives a speeding ticket, costing money not only when paying the ticket but also through increased car insurance bills. Those are aversive consequences by most standards and should discourage speeding. This widely used positive punishment differs markedly from rewarding drivers with money for driving at or under the speed limit. Behavior-based insurance that monitors driving and adjusts rates accordingly is becoming the norm, blending punishment and reinforcement approaches.

Positive punishment examples include: reprimands for misbehavior, spanking for rule violations, extra chores for poor grades, social disapproval for rudeness, hangovers following excessive drinking. Each adds something unpleasant after behavior, intended to decrease that behavior.

Negative Reinforcement: Removing Something Unpleasant

Negative reinforcement is the removal of an unpleasant stimulus or consequence, intended to encourage future demonstrations of a targeted behavior. Something aversive is removed following the behavior (Skinner, 1938).

Example: Alarm Clocks (Revisited). Having trouble getting up in the morning? Get an alarm clock with loud, annoying buzzing and place it across the room. Jumping out of bed and hitting the appropriate button is the fastest way to make the alarm stop shrieking. Taking away the aversive sound provides negative reinforcement that increases quick, reliable jumping-out-of-bed behavior.

Negative reinforcement is often misunderstood as punishment, but it’s actually reinforcement—it increases behavior by removing something unpleasant. Examples include: taking pain relievers to remove headaches (reinforces pill-taking), leaving noisy environments to escape noise (reinforces leaving), studying to avoid anxiety about exams (reinforces studying), apologizing to end arguments (reinforces apologizing). Each removes something aversive, increasing the behavior that terminated the aversive stimulus.

Negative Punishment: Removing Something Pleasant

Negative punishment, also called omission training, is the removal of a pleasant stimulus or consequence, intended to discourage future demonstrations of a targeted behavior. Something desirable is removed following the behavior (Skinner, 1938).

Example: Time-Out. A child screams at his father because he wants the television remote control. To discourage screaming—an unwanted behavior—the father places the child in time-out for two minutes. The child must sit quietly in a designated place, preferably away from reinforcing stimuli. Time-out removes access to reinforcing activities (watching TV, playing, social interaction), thereby decreasing screaming behavior.

Example: Lost Field Trip Privilege. Teachers often schedule fun field trips to conclude the school year, with the stipulation that students must stay out of trouble to participate. The possibility of attending the field trip is taken away from students who misbehave in the classroom. The target behavior is classroom misbehavior; what’s taken away is the opportunity for a fun, rewarding field trip. This removal of a pleasant consequence decreases misbehavior.

Negative punishment examples include: losing privileges for poor behavior, having toys taken away for misbehavior, losing allowance for chores left undone, grounding teenagers who violate curfew, revoking driving privileges for traffic violations. Each removes something desirable, decreasing the behavior that preceded the removal.

Discrimination and Generalization: Learning When and Where to Respond

After Thorndike and Skinner presented their findings, researchers began breaking down elements of the three-part contingency to understand how learning proceeds. They manipulated the discriminative signal (SD) to determine whether animals could learn to respond the same way to a range of signals or respond differently based on slight variations in the signal.

Generalization in Operant Conditioning

Generalization occurs when an animal or individual learns to respond in the same way to similar, but not identical, stimuli. Once a behavior is reinforced in the presence of one SD, similar stimuli also evoke that behavior (Skinner, 1938).

Similarity-based generalization occurs when two stimuli that are physically similar—such as lights of similar wavelength—demonstrate stimulus generalization. The more similar stimuli are physically, the more generalization occurs.

Meaning-based generalization occurs when two stimuli are paired together during learning. An animal may be trained to press a lever (R) to receive food (SR) after perceiving a combination of tone and light (SD1 + SD2). Later, the animal may demonstrate lever pressing when presented only with the tone (SD1) or only with the light (SD2). The two stimuli are assumed to have the same meaning in terms of their ability to predict reinforcement.

Discrimination in Operant Conditioning

Discrimination occurs when an animal or individual learns to tell the difference between two or more forms of a stimulus; the organism learns which form leads to reinforcement and which doesn’t. Through differential reinforcement, responding becomes selective (Skinner, 1938).

Example: New Parent Learning. New parents often find they learn to distinguish between different types of cries from their newborn infants. Parents report distinguishing cries of hunger, pain, fatigue, discomfort, and frustration. This knowledge is acquired through operant conditioning. The first few days with a newborn include many instances of trial and error to make a baby stop crying. One author, as a new mother, was frustrated trying to soothe her three-day-old crying infant. After the mother adjusted a loose sock on the infant’s foot, the baby stopped crying—all was well! That was the first instance of reinforcement for the young parent, and she quickly learned that discomfort in her infant was associated with a high-pitched scream. Different cries (SDs) signaled different needs, and responding appropriately (R) led to baby ceasing to cry (SR).

Shaping: Creating New Behaviors Through Successive Approximations

If you want to increase exhibition of a behavior an animal already demonstrates, wait until it happens, then provide reinforcement. The law of effect states that if a behavior is followed by a satisfying state of affairs, the probability that the behavior will be repeated increases. But what if you want an animal to exhibit a behavior that’s not part of its normal repertoire?

Shaping is a form of operant learning in which an animal or individual learns a complex set of behaviors through reinforcement of successive approximations to the complex goal behavior. Behaviors gradually become more similar to the target behavior through differential reinforcement (Skinner, 1938).

A recent comprehensive review organized gradual change procedures—including shaping, fading, and chaining—into a unified taxonomy based on which functional component of the contingency each procedure modifies: discriminative stimuli, response requirements, or reinforcement (Kaplan, 2023). This framework highlights how shaping specifically modifies response requirements by gradually changing the criteria for reinforcement, requiring closer and closer approximations to the target behavior. The review emphasizes that these gradual change procedures represent a “conceptually systematic technology of behavior change with wide-ranging empirical support across diverse settings and contexts” (Kaplan, 2023, p. 1).

Shaping Toilet Training

As all parents and caregivers attest, using the toilet is not in an infant’s or toddler’s usual behavioral repertoire. A child may initially be rewarded simply for noticing and announcing to a parent that they sense the need to use the toilet. As shaping continues, the child may be rewarded not simply for noticing the sensation, but for acting on it in increasingly complex ways: walking toward the bathroom, pulling down pants, sitting on the toilet, successful elimination. The goal is to shape the child’s behavior so they gradually learn a new behavior very different from currently demonstrated behavior.

Shaping a Rat to Press a Lever

The process begins by giving food (a primary reinforcer) randomly so the rat associates the sound of the pellet dispenser (a conditioned reinforcer) with food. At first, the sound startles the rat, so classical conditioning must first make the sound a conditioned reinforcer. Next, reinforcement is delivered for any detectable head movement upward when near the lever. Gradually, the criterion for reinforcement becomes more demanding: head near lever, touching lever, pressing lever slightly, pressing lever firmly enough to trigger the mechanism.

Shaping takes advantage of the fact that there’s variability in animals’ normal behaviors. In a distribution of head heights, some reach higher than others. By reinforcing increasingly higher head positions, you shift the entire distribution upward. This method—successive approximations—can teach elephants to dance, definitely not something in an elephant’s normal behavioral repertoire!

Shaping Elephants to Dance

To teach an elephant to stand on its back legs, trainers might first reward the elephant whenever it shifts weight toward the back, then reward sitting on hind feet, then lifting one foot off the ground, then two feet, then standing. Each approximation is reinforced until reliable, then the criterion advances.

Remarkable Shaping Achievements

Shaping has produced remarkable achievements. Pigeon Missiles: During World War II, Project Orcon (organic control) trained pigeons to ride inside missiles and peck at target images picked up by a lens in the missile’s nose. The pigeon’s pecking translated into error signals correcting the missile’s flight. While eventually replaced by electronic guidance, this demonstrated shaping’s power. Ping-Pong Playing Pigeons: Pigeons have been shaped to play ping-pong, hitting balls back and forth across a small table. Piano-Playing Pigeons: Pigeons have been shaped to peck piano keys in specific sequences, producing recognizable melodies. Mine-Detecting Rats: Rats have been shaped to detect landmines, using their keen sense of smell to locate explosives while being light enough not to trigger detonation. Dog Training: Virtually all sophisticated dog training—from service dogs to police dogs to competitive obedience—relies on shaping principles to build complex behavioral sequences from simpler components.

Looking Forward

We’ve mastered the four consequences that drive operant conditioning: positive reinforcement (add pleasant), negative reinforcement (remove unpleasant), positive punishment (add unpleasant), and negative punishment (remove pleasant). We’ve seen how organisms generalize learned responses across similar stimuli while discriminating between stimuli that predict different outcomes. We’ve explored shaping, the powerful technique that creates complex behaviors through successive approximations, enabling remarkable achievements from toilet training toddlers to training pigeons to guide missiles. But when should reinforcement be delivered? Must we reinforce every occurrence of a behavior, or can intermittent reinforcement work? How do different patterns of reinforcement affect behavior? In Part 3, we’ll examine reinforcement schedules—the temporal patterns of reinforcement delivery that profoundly influence response rates, persistence, and resistance to extinction. We’ll explore fixed-ratio, fixed-interval, variable-ratio, and variable-interval schedules, seeing how each produces distinctive response patterns. We’ll also examine extinction in operant conditioning, including the extinction burst and spontaneous recovery, and explore superstitious behavior—what happens when organisms mistakenly associate random behaviors with reinforcement. Understanding schedules and extinction completes our foundation in operant conditioning principles.

License

Psychology of Learning TxWes Copyright © by Jay Brown. All Rights Reserved.