06-3: Reinforcement Schedules
Psychology of Learning
Module 06: Operant Conditioning 1
Part 3: Reinforcement Schedules
Looking Back
In Parts 1 and 2, we established operant conditioning fundamentals. We distinguished voluntary behaviors shaped by consequences from reflexive behaviors elicited by stimuli, explored Thorndike’s, Guthrie’s, and Skinner’s contributions, and mastered the three-part contingency (SD → R → SR). We examined four consequences: positive reinforcement (add pleasant), negative reinforcement (remove unpleasant), positive punishment (add unpleasant), and negative punishment (remove pleasant). We saw how organisms generalize across similar stimuli while discriminating between stimuli predicting different outcomes, and explored shaping—building complex behaviors through successive approximations. But we’ve assumed reinforcement follows every response. Must we reinforce every occurrence? Can intermittent reinforcement work? How do different patterns of reinforcement delivery affect behavior? These questions led researchers to discover that when and how often reinforcement occurs dramatically influences response rates, persistence, and resistance to extinction. Understanding reinforcement schedules unlocks powerful insights about why some behaviors persist stubbornly while others quickly extinguish.
Reinforcement Schedules: When Does Reinforcement Occur?
Continuous reinforcement is the delivery of reinforcement every time a desired behavior is demonstrated. Every response produces a consequence. This creates steady rates of behavior during learning but requires that reinforcement be constantly present for behavior to continue (Ferster and Skinner, 1957).
Early researchers were curious whether changing when reinforcement was delivered would change how animals would respond. This led to the discovery of intermittent reinforcement.
Intermittent reinforcement, also called partial reinforcement, is the converse of continuous reinforcement—a schedule in which reinforcement does not occur each time the behavior is demonstrated. Only some responses are reinforced (Ferster and Skinner, 1957).
Intermittent reinforcement schedules are classified along two dimensions: the basis for reinforcement (ratio vs. interval) and the predictability of reinforcement (fixed vs. variable). This creates four basic schedules, each producing distinctive response patterns.
Fixed-Ratio (FR) Schedule: Reinforcement After a Set Number of Responses
A fixed-ratio (FR) schedule provides reinforcement after a set number of behaviors are demonstrated. For example, in an FR5 schedule, every fifth response is reinforced. FR schedules produce high, steady rates of responding with brief pauses after reinforcement (Ferster and Skinner, 1957).
The post-reinforcement pause is the brief cessation of responding that occurs immediately after reinforcement. The size of the pause is related to the number of behaviors required for reinforcement—larger ratios produce longer pauses (Ferster and Skinner, 1957).
Example: Raking Leaves for Pay. Suppose it’s autumn, you have many trees in your yard, and your parents want you to rake leaves before it snows. They offer to pay you $5 for each full bag of leaves as soon as you bring it to the garage. If your behavior is typical of people on FR schedules, you’ll work quickly to fill a bag, collect your $5, then take a short break before starting the next bag. When you’re close to having a bag full, you’ll work even more quickly in anticipation of the next $5 payment. This is classic FR responding: high rate, brief pause after reinforcement, acceleration as reinforcement approaches.
Piece-work pay systems exemplify FR schedules—workers are paid based on how many units they produce. Factory workers paid per widget, farm workers paid per basket of fruit, and freelance writers paid per article all work under FR schedules. These schedules produce high productivity but can lead to burnout due to the relentless work-pause-work-pause pattern.
Fixed-Interval (FI) Schedule: Reinforcement After a Set Time Period
A fixed-interval (FI) schedule provides reinforcement for the first response after a specific time period has passed. After reinforcement, the time interval restarts. FI schedules result in responding that is relatively low immediately after reinforcement, increasing in a burst as the time interval nears completion. This produces a distinctive scalloped response pattern (Ferster and Skinner, 1957).
Example: Studying for Exams. Many students demonstrate FI responding when they study. Students often study very little between exams—the time immediately after an exam shows minimal studying. The closer the date for an upcoming exam, the more studying occurs. The rate of responding increases dramatically as the time interval comes to a close, producing frantic cramming right before exams. This scalloped pattern—low responding after reinforcement, acceleration as the next reinforcement opportunity approaches—characterizes FI schedules.
Checking email or social media can follow FI patterns if you know updates arrive at specific times. Repeatedly checking immediately after the known update time would be pointless, so checking decreases right after updates and increases as the next scheduled update approaches. Baking follows FI schedules—repeatedly opening the oven door right after putting in cookies is pointless, but checking increases as the timer nears completion.
Variable-Ratio (VR) Schedule: Reinforcement After an Average Number of Responses
A variable-ratio (VR) schedule provides reinforcement after an average number of desired behaviors have been demonstrated. In a VR10 schedule, reinforcement occurs on average every 10 times the behavior is demonstrated, but the actual number varies unpredictably—sometimes after 3 responses, sometimes after 15, averaging 10. Because VR schedules are unpredictable, they result in high and consistent frequency of responses with no post-reinforcement pauses (Ferster and Skinner, 1957).
Example: Slot Machines. Slot machines in casinos pay out on variable-ratio schedules such that, on average, one of every 100 coins fed into the machine results in a jackpot of some size. You may sit at a machine for hours and spend all your money, losing everything. The person next to you inserts coins just three times and earns the jackpot. This unpredictability—you never know which response will be reinforced—produces persistent, steady responding. Gamblers show no post-reinforcement pauses; winning produces brief celebration followed immediately by continued play. The next win could be just one more pull away!
VR schedules are incredibly powerful for maintaining behavior. They produce the highest rates of responding and greatest resistance to extinction of all schedules. Examples include: sales commissions (you never know which customer will buy), fishing (never know which cast will catch a fish), job hunting (never know which application will get an interview), and asking people out on dates (never know who will say yes). Each produces persistent behavior because the next response might be the one that pays off.
Variable-Interval (VI) Schedule: Reinforcement After an Average Time Period
A variable-interval (VI) schedule provides reinforcement for a single response after an unknown time interval has passed. Though the length of any given interval is unknown, the average is known. VI schedules produce slow, steady responding with no pauses (Ferster and Skinner, 1957).
Example: Surprise Supervisor Visits. Your company may require a supervisor to visit your office 26 times per year. You might guess the supervisor will visit roughly every two weeks. But your supervisor may be sneaky and prefer to “pop in” at the satellite office unannounced. The supervisor has been known to pop in two days in a row and to wait a month or more between visits. Because of this, you and your colleagues are never sure when the supervisor will appear. For job security’s sake, you and colleagues stay consistently busy doing company work. This steady, moderate-rate responding without pauses characterizes VI schedules.
Checking email or social media typically follows VI schedules—you don’t know exactly when new messages will arrive, but they come periodically. This produces frequent, steady checking behavior. Pop quizzes in classes follow VI schedules, encouraging consistent studying rather than the scalloped pattern of studying only before announced exams. Waiting for phone calls, checking the mailbox, and monitoring for weather updates all follow VI patterns, producing steady, persistent behavior.
Other Reinforcement Schedules
A concurrent schedule occurs when an animal is exposed to two or more different schedules on different keys or levers simultaneously. Concurrent schedules can be used to study preferences—organisms allocate responses to alternatives in proportion to the reinforcement rates those alternatives provide (Herrnstein, 1961).
A chained schedule occurs when two or more simple schedules are placed in a fixed sequence and each is signaled by different discriminative stimuli. Perhaps a VR10 signaled by a red light (SD1) must be completed before a VI10 signaled by a green light (SD2) becomes available (Ferster and Skinner, 1957).
A progressive-ratio schedule is like an FR schedule, but the ratio requirement increases after each reinforcement—starting as FR1, then FR2, then FR3, and so on. Progressive-ratio schedules are used to measure the strength of motivation or addiction in animal studies—how far will an animal progress (how much will they sacrifice) to receive a reinforcer or drug? (Hodos, 1961).
Technically, extinction is also a reinforcement schedule—one where reinforcement never occurs regardless of responding. This brings us to extinction processes.
Extinction: Eliminating Learned Behaviors
Getting a behavior to occur is only part of learning; researchers interested in operant conditioning are also interested in how behaviors can be extinguished, or cease to occur.
Extinction in operant conditioning is the elimination of a behavior by removing the reinforcement that maintains it. Over time, absence of reinforcement should result in discontinuation of the behavior. This is the basic premise for much of applied behavior analysis in education and therapy (Skinner, 1938).
However, all behaviors don’t extinguish equally easily; ease of extinction varies according to the reinforcement schedule used when the behavior was learned. Behaviors learned on VR or VI schedules show particular resistance to extinction.
For an animal or person to stop demonstrating a behavior, they must learn that the behavior no longer leads to reinforcement. Because variable schedules (VR and VI) reinforce infrequently and irregularly, it takes many trials to determine that the behavior no longer leads to reinforcement. How do you know the slot machine is broken versus just on a long losing streak? This ambiguity makes extinction from variable schedules particularly difficult.
Gambling behavior, characterized by VR reinforcement schedules, is notoriously difficult to extinguish. Gamblers may lose repeatedly yet continue playing because each loss could simply be part of the variable pattern—the big win could be coming soon. This resistance to extinction makes gambling addiction so persistent and problematic.
The Extinction Burst: Things Get Worse Before They Get Better
The extinction burst is the temporary increase in the frequency of a behavior that is being extinguished. Before behavior decreases, it often increases dramatically—a last-ditch effort to obtain reinforcement (Skinner, 1938).
If you’ve been in a class with a disruptive student, you may have noticed that initially, when the teacher convinces the class to ignore the behavior, the student seeking attention merely does more clowning to get attention. Thus, the behavior suddenly increases in frequency. This increase is the extinction burst. The wise teacher encourages the class to continue ignoring the behavior during this period, as it’s normally very brief. Eventually, the student’s disruptive behavior should be eliminated if reinforcement is consistently withheld.
Recent research has developed new theoretical frameworks for understanding extinction bursts. Fisher and colleagues (2023) proposed that extinction bursts can be explained using principles from the matching law: during baseline, individuals allocate time between the target response and reinforcer consumption; at the start of extinction, individuals temporarily allocate more time to the target behavior because it still has value from its reinforcement history while reinforcer consumption is no longer available to compete with it. This temporally weighted matching law helps explain why extinction bursts are transient—the target behavior decreases shortly after the burst because its value decreases with continued exposure to extinction.
Extinction bursts explain why behavior management initially seems to make problems worse. A child throws tantrums to obtain candy. Parents decide to stop giving in. The child’s first tantrum under the new regime will likely be the worst ever—louder, longer, more intense. If parents give in during this extinction burst, they inadvertently reinforce an even more extreme tantrum, making future behavior worse. But if they persist through the burst, tantrums will decrease. Understanding extinction bursts helps people persist through the temporary worsening that precedes improvement.
Spontaneous Recovery in Operant Conditioning
Spontaneous recovery is the recovery of a behavior after it has been extinguished. Even after behavior has been completely eliminated, it may reappear, particularly when cues associated with the behavior are present (Skinner, 1938).
Just as in classical conditioning, extinguished operant behaviors can spontaneously recover. A child’s tantrum behavior might be successfully extinguished through consistent non-reinforcement. Days or weeks later, in a situation previously associated with tantrums, the behavior may suddenly reappear. This doesn’t mean extinction failed—it means extinction is context-dependent and that extinguished responses remain available in the behavioral repertoire.
Spontaneous recovery explains why behavior change requires patience and consistency. Single successful extinction episodes don’t guarantee permanent change. Behavior management must persist across multiple contexts and time periods to truly eliminate unwanted behavior.
Superstitious Behavior: Accidental Conditioning
Thorndike believed learning was based on developing associations between behaviors and their consequences. However, we may also associate a behavior with an outcome when the two occur together only randomly—when there’s no actual cause-effect relationship between the behavior and outcome. When our behavior changes based on that random association, we’ve developed a superstition.
Superstitious behavior occurs when an organism develops a response based on an accidental temporal relationship between a behavior and a reinforcer. The behavior wasn’t actually responsible for producing the reinforcer, but because they occurred together, the organism acts as if the behavior caused the outcome (Skinner, 1948).
Skinner’s Superstitious Pigeons
Skinner (1948) created superstitious behaviors in pigeons by placing them in Skinner boxes set to deliver food every 15 seconds. The birds were not reinforced for any specific behavior—food arrived regardless of what they were doing. Yet the birds began demonstrating odd behaviors such as turning in circles or pecking aimlessly. Each pigeon repeated behaviors it happened to be demonstrating naturally just before reinforcement occurred. Although reinforcement wasn’t contingent on particular behaviors, birds developed associations between random behaviors and food presentation.
One pigeon turned counterclockwise, another thrust its head into a corner, a third developed a head-tossing motion. Each had been accidentally “caught” doing these behaviors when food arrived. The accidental temporal pairing was sufficient to produce conditioning, even though no causal relationship existed. The pigeons acted superstitiously, believing their behaviors produced food.
Superstitious Behavior in Humans
Malinowski (1954) was one of the first researchers to report that superstitious behavior among humans is related to the degree of perceived uncertainty or unreliability of an event. In the absence of scientific explanations, superstitious explanations are the best you’ve got! When outcomes are unpredictable or uncontrollable, people grasp for patterns—even illusory ones—that provide a sense of control.
The fact that many professional athletes engage in superstitious behaviors tends to bear this out. Gmelch (1971) reported that superstitious behaviors among baseball players were more common in relation to hitting, which has a low success rate (even excellent hitters fail 70% of the time), than in relation to fielding, which enjoys a much higher success rate. When outcomes are uncertain and stakes are high, superstitious behaviors flourish.
Common superstitions include wearing “lucky” clothing during exams, following specific pre-performance routines, avoiding black cats or walking under ladders, and countless other behaviors maintained by occasional coincidental pairings with positive outcomes. Students may wear the same shirt to every exam after wearing it during one successful test. Athletes develop elaborate pre-game rituals. While these behaviors have no causal relationship to outcomes, the occasional pairing with success (which would have occurred anyway) maintains the superstitious behavior on a variable-ratio schedule!
Looking Forward
We’ve explored how reinforcement schedules dramatically influence behavior. Fixed-ratio schedules (FR) produce high rates with post-reinforcement pauses. Fixed-interval schedules (FI) produce scalloped patterns with acceleration before reinforcement. Variable-ratio schedules (VR) produce highest, steadiest rates with extreme resistance to extinction. Variable-interval schedules (VI) produce moderate, steady rates. We examined extinction—removing reinforcement to eliminate behavior—including extinction bursts (temporary increases before decreases) and spontaneous recovery (reappearance of extinguished behavior). We explored superstitious behavior—accidental conditioning when responses coincidentally precede reinforcers. These principles form the foundation of operant conditioning. But operant conditioning doesn’t exist in isolation. We began this module discussing employee absenteeism, studied for exams under FI schedules, and learned superstitions through accidental pairings. Classical and operant conditioning interact constantly in real life—emotional responses condition classically while voluntary behaviors condition operantly. Understanding both forms and their interactions provides complete pictures of learning. We’ve now completed our systematic exploration of operant conditioning’s basic principles. These foundations enable sophisticated applications in education, therapy, workplace management, animal training, and self-improvement—topics that build upon the principles we’ve mastered.