"

06 1: The Basics of Operant Conditioning

Psychology of Learning

Module 06: Operant Conditioning 1

Part 1: The Basics of Operant Conditioning

Looking Back

In Modules 04 and 05, we explored classical conditioning—a form of learning where neutral stimuli acquire the power to elicit responses through association with unconditioned stimuli. We examined Pavlov’s discovery of how metronomes came to elicit salivation, the factors affecting acquisition and extinction, taste aversion learning that violated typical conditioning principles, cue competition effects like blocking and overshadowing, and sophisticated theoretical accounts including the Rescorla-Wagner model and comparator hypothesis.

Classical conditioning applies primarily to reflexive, involuntary behaviors—responses elicited by stimuli. But most human behaviors aren’t reflexive. We don’t automatically perform actions in response to specific stimuli. Rather, we emit voluntary behaviors based on their consequences. We work for paychecks, study for grades, exercise for health, and arrive on time to avoid penalties. These behaviors fall under a different form of learning: operant conditioning.

The Employee Absenteeism Problem

The CEO of a small software company notices troubling patterns: some employees arrive late regularly, others call in sick frequently, while still others arrive early and rarely miss work. Employee tardiness and absenteeism drains productivity and erodes morale. What can be done?

Should the company dock pay for employees who arrive late? Reward those who show up early or on time? Terminate employees for chronic lateness? Would an incentive other than money—perhaps greater flexibility with scheduling—encourage punctuality? Of course, people who are genuinely sick need to stay home, so any solution must target voluntary absenteeism without penalizing legitimate illness.

Research comparing different approaches offers clear answers. Employees who participated in recognition programs—with monthly, quarterly, and annual acknowledgment for work attendance—showed decreased absenteeism across four quarters, with reductions ranging from 29% to 52% compared to baseline. In contrast, employees who merely received reports about their absenteeism, or no intervention at all, showed no consistent decreases. Recognition and reward work; information alone does not (Markham, Scott, & McKee, 2002).

A comprehensive meta-analysis of 179 studies confirms these findings: work engagement correlates positively with task performance and negatively with absenteeism, with longitudinal data demonstrating that engagement predicts future attendance behavior (Neuber, Englitz, Schulte, Forthmann, & Holling, 2022). When employees feel recognized and engaged, they show up.

Employee behaviors like arriving on time or calling in sick are generally under conscious control and subject to principles of operant conditioning. Unlike reflexive salivation or pupil dilation, these voluntary behaviors can be shaped through consequences. Understanding operant conditioning principles explains why some workers arrive late while others arrive early, why some work hard while others slack off, and most importantly, how to change these behaviors effectively.

Operant Conditioning: Learning from Consequences

Operant conditioning, also known as instrumental conditioning, is learning that occurs when behavior changes based on the consequences of that behavior. It applies to behaviors under conscious control of the animal or person (Skinner, 1938).

The distinction between classical and operant conditioning is fundamental. Classical conditioning focuses on changing unconscious, reflexive behaviors that are elicited by stimuli—like salivation to food, pupil dilation to light, or fear to loud noises. The organism doesn’t choose whether to respond; the stimulus automatically triggers the response. Operant conditioning applies to conscious, voluntary behaviors that are emitted by the organism—like driving under the speed limit because you’ve received too many speeding tickets, studying because good grades lead to rewards, or exercising because it makes you feel better. The organism chooses when and whether to perform the behavior based on learned consequences.

While some behaviors are reflexive and involuntary, most human behaviors don’t occur rigidly in response to specific stimuli. Reflexive behaviors resist modification through reward and punishment. But many behaviors are voluntary—when a stimulus is present, we can choose which response is most appropriate. This doesn’t mean behavior is unpredictable. Rather, it means behavior is determined by learned relationships between actions and outcomes.

Edward Thorndike: Trial-and-Error Learning

The study of operant conditioning began at the turn of the 20th century with Edward Thorndike’s (1874–1949) work on animal intelligence. Thorndike placed hungry cats inside puzzle boxes—wooden crates with escape mechanisms like latches, loops of string, or levers. Food sat visible outside the box. To escape and reach the food, cats had to perform specific actions like pulling a loop or pressing a lever.

Thorndike observed that animals seemed to learn by trial-and-error learning, producing gradual changes in behavior. Initially, cats tried many behaviors—clawing at bars, reaching through gaps, meowing—before accidentally triggering the escape mechanism. On subsequent trials, unsuccessful behaviors decreased while successful behaviors increased. The cats needed time to try various responses before figuring out which one worked (Thorndike, 1898).

Thorndike’s Laws of Learning

Based on his animal learning research, Thorndike (1911) proposed three laws. The law of effect stated that behaviors followed by reinforcing consequences (a “satisfying state of affairs”) are more likely to be repeated in the future. This became the cornerstone of operant conditioning theory.

The law of recency stated that the most recently demonstrated behavior is most likely to be repeated. According to this law, even if an animal had been reinforced for lever pressing, if another behavior (like jumping) occurred most recently, jumping would be more likely on the next trial.

The law of exercise proposed that when all else is equal, associations are strengthened through repetition. The law of use stated that more frequent associations become stronger; the law of disuse stated that infrequently used associations become weak.

Thorndike focused on associations between behavior and desirable consequences. He viewed learning as forming stimulus-response connections, describing it mechanistically: “The association leading to the successful act becomes stamped in, while futile associations become stamped out.”

Thorndike’s 1929 Revisions

In 1929, Thorndike made a dramatic announcement: “I was wrong.” He discarded the law of exercise after research showed practice alone wasn’t enough to strengthen associations. Mere repetition without consequences didn’t produce learning. Additionally, passage of time alone didn’t weaken associations—disuse didn’t cause forgetting as he had originally thought.

He also revised the law of effect. While reinforcing consequences strengthened associations, he concluded that punishment did not symmetrically weaken them. This asymmetry between reinforcement and punishment influenced later research and sparked ongoing debates about punishment’s effectiveness.

Edwin Guthrie: One-Trial Learning

Edwin Guthrie (1886–1959) challenged Thorndike’s gradual learning view. Observing cats in simple puzzle boxes that required only moving a pole, Guthrie noticed two patterns: After a few trials, individual cats showed little variability—they settled on a single method. But different cats used radically different methods. One cat moved the pole with its rump, another by biting it, a third by pawing it.

One-trial learning was Guthrie’s contention that the association between a pattern of stimuli and a response develops at full strength after just one pairing. “A stimulus pattern gains its full associative strength on the occasion of its first pairing with a response.” This contradicted the widely accepted belief in gradual learning through repetition (Guthrie, 1930).

If learning happens in one trial, why does practice improve performance? Guthrie distinguished between movements, acts, and skills. Movements are specific responses to specific stimulus configurations. Movements gain full associative strength after one exposure—this is one-trial learning. Acts are responses to varying stimulus configurations, consisting of many movements. Learning an act involves learning specific responses under various conditions. Skills consist of many acts. Learning golf requires many acts (putting, driving, playing from sand traps), each requiring many movements. Practice improves skills not because individual movements strengthen gradually, but because performers learn which movements work under different circumstances.

The stop action principle states that there’s a parallel between a camera’s action and a reinforcer’s function. The specific bodily position and muscle movements occurring at the moment of reinforcement will have higher probability of occurring on the next trial. This offers a version of the law of effect—reinforcement reduces behavioral variability, settling on the behavior that immediately preceded reinforcement (Guthrie & Horton, 1946).

B.F. Skinner: Radical Behaviorism and the Three-Part Contingency

Like Thorndike, B.F. Skinner (1904–1990) noticed that behavior consequences affect whether behaviors recur. But Skinner developed a more comprehensive framework for understanding operant conditioning.

Radical behaviorism was Skinner’s position that behavior is solely influenced by an organism’s experience with the consequences of that behavior. Skinner rejected mentalistic explanations, focusing exclusively on observable relationships between environmental events and behavior (Skinner, 1938).

Skinner recognized that no behavior occurs in a vacuum. Behaviors are influenced by environmental events before and after they occur. This insight led to his most important contribution: the three-part contingency, which is the general model of operant conditioning. A discriminative stimulus (SD) cues an organism to respond with a specific behavior (R). The response is followed by a reinforcing stimulus (SR). The SD, R, and SR are the three parts of the behavioral contingency that produces operant conditioning (Skinner, 1938).

Skinner’s research examined relationships among these three parts by observing how changes to environmental cues and consequences influenced an animal’s response rate. He constructed operant conditioning chambers—often called Skinner boxes—that allowed precise control over preceding events (cues) and timing/frequency of reinforcement or punishment. These chambers typically contained levers or keys that animals could press, food dispensers, lights, and recording equipment that tracked responses automatically.

The Operant Conditioning Model: SD → R → SR

The operant conditioning model consists of three elements in sequence. SD (Discriminative Stimulus) is a stimulus signaling that a particular behavior will lead to a particular consequence. It acts as a cue that specific behavior should be demonstrated. R (Response) is the voluntary behavior the organism performs—the operant behavior being learned. SR (Reinforcing Stimulus) is the consequence following the response. This consequence determines whether the behavior will increase or decrease in the future.

The SD serves as the preceding event, R as the behavioral event, and SR as the consequence in the three-part contingency relationship.

Animal Laboratory Examples

In animal studies, the SD might be a light flash signaling an animal to press a lever. Pressing the lever (R) then results in food pellets (SR), reinforcing lever pressing. The challenge for the animal is learning that pressing the lever produces reinforcement only when the light is illuminated—not at other times.

A goldfish sees a pink target (SD), begins “driving” using appropriate movements (R), and receives fish food (SR). Yes, researchers have trained goldfish to drive small cars by reinforcing steering behaviors when target colors appear!

Real-World Examples

Driving: When the traffic light turns green (SD), you step on the gas and move through the intersection (R). What’s the reinforcer? Moving through the intersection and making progress toward your destination is reinforcing (SR). Additionally, you avoid honking cars and screaming drivers—negative consequences you escape by responding appropriately.

Saying Thank You: A guest gives a child a gift. The parent asks, “What do you say when someone gives you a gift?” This provides the cue (SD) to say “Thank you” (R). Both guest and parent smile, and the guest responds, “You are very welcome! How polite you are!” (SR). Social approval reinforces polite behavior.

Alarm Clocks: Having trouble getting up? Set an alarm clock with loud, annoying buzzing and place it across the room. The alarm (SD) signals you to jump out of bed and hit the button (R). Stopping the aversive noise (SR) reinforces quick, reliable jumping-out-of-bed behavior.

Complex Contingencies

We learn complex contingencies involving multiple discriminative stimuli and consequences. As any three-year-old can tell you: When Dad’s not in the room (SD1), jumping on the bed (R) leads to fun (SR1). When Dad’s in the room (SD2), jumping on the bed (R) leads to spanking (SR2).

The same behavior produces different consequences depending on the discriminative stimulus present. Children quickly learn these conditional relationships, demonstrating sophisticated discrimination learning.

Looking Forward

We’ve established the foundations of operant conditioning, distinguishing it from classical conditioning and exploring contributions of three pioneering theorists. Thorndike showed that behaviors followed by satisfying consequences are repeated (law of effect). Guthrie proposed one-trial learning, arguing that individual movements gain full strength immediately while practice improves skills by building repertoires of movements for varying situations. Skinner synthesized these ideas into the three-part contingency (SD → R → SR), providing a general framework for understanding how consequences shape voluntary behavior.

But we’ve only scratched the surface. The three-part contingency can produce four different consequences, each affecting behavior differently. In Part 2, we’ll examine these four consequences: positive reinforcement (adding something pleasant), negative reinforcement (removing something unpleasant), positive punishment (adding something unpleasant), and negative punishment (removing something pleasant). Each consequence produces specific, predictable effects on behavior. Understanding these four types of consequences is essential for applying operant conditioning principles effectively, whether training animals, educating children, managing employees, or changing our own behaviors.

License

Psychology of Learning TxWes Copyright © by Jay Brown. All Rights Reserved.