09-1: Choice & the Matching Law
Psychology of Learning
Module 09: Decision-Making 1
Part 1: Choice & the Matching Law
Looking Back
Throughout this course, we’ve examined how organisms learn associations & behaviors through classical & operant conditioning. But a critical question remains: How do organisms choose between multiple available options? In natural environments, animals rarely face single-response situations. A foraging bird chooses between different food patches. Humans constantly make choices—what to eat, which route to drive, how to spend time. Understanding choice behavior requires examining how organisms allocate responses among alternatives.
Introduction: The Ubiquity of Choice
Humans, living in the insulated world of the 21st century, don’t normally make individual decisions which by themselves lead to life or death. Though your decisions may have cumulative effects that eventually lead to life or death (choosing to eat healthy, choosing to exercise, choosing to drive safely), individually they are rarely so dramatic. Animals living in a world of constant struggle for survival often do make choices that have immediate effects on their lives. They need to be more careful than we do. Natural selection clearly favors organisms which can make effective choices—choices which lead to survival & reproduction. When chased by a predator, choosing to run toward open ground rather than dense cover may determine immediate survival. Understanding choice becomes essential for understanding adaptation.
Judgment is a type of decision-making in which one estimates the likelihood of an event. Examples include estimating the probability of rain tomorrow, judging how likely a defendant is guilty, or assessing chances of success on an exam (Tversky & Kahneman, 1974).
Choice is a form of decision-making which involves selecting among alternatives. A restaurant patron chooses between menu items, a student selects which classes to take, an investor picks stocks to purchase. Choice differs from judgment in requiring selection rather than estimation (Herrnstein, 1961).
This module focuses on choice—how organisms distribute responses among available alternatives. We’ll examine laboratory procedures for measuring choice, discover the Matching Law describing how choices are allocated, & explore how this fundamental principle applies across species & situations.
Measuring Choice in Animals
If a Skinner box is designed with two or more response options, then the animal’s choices & preferences can be measured. A pigeon can be presented with two keys, each providing reinforcement on different schedules. By recording which key the pigeon pecks more frequently, researchers quantify preference. This simple arrangement allows systematic investigation of factors influencing choice.
The Impulsivity Paradigm: Immediate Versus Delayed Reinforcement
When hungry pigeons are exposed to the following paradigm, they almost exclusively choose the Red key—that is, they are almost totally impulsive:
Procedure: 1. House light turns on, signaling start of trial. 2. Both keys (Red & Green) light simultaneously. 3. Pigeon must peck one key five times (FR5 requirement). 4. After fifth peck, both keys go dark. 5. If Red key was chosen: 2 seconds of food access provided immediately. 6. If Green key was chosen: 20-second delay, then 6 seconds of food access.
Pigeons overwhelmingly prefer the Red key despite receiving less total food. The Green key provides three times as much food (6 seconds versus 2 seconds), but requires waiting 20 seconds. This demonstrates impulsivity—preference for smaller-immediate rewards over larger-delayed rewards. The immediate availability of food from the Red key outweighs the larger quantity available from the Green key. Delay dramatically reduces the value of reinforcement, a phenomenon we’ll explore more deeply in Module 10 when examining delay discounting.
This paradigm reveals a fundamental aspect of choice: temporal proximity matters enormously. Organisms aren’t simply calculating which option provides more reinforcement in absolute terms. They’re weighing reinforcement amount against delay, & delay exerts powerful influence. Understanding how delay affects value becomes crucial for explaining many real-world choices—why people choose immediate gratification over long-term benefits, why diets fail, why saving money is difficult.
Measuring Choice in Humans
The Marshmallow Test: Mischel’s Classic Study
Walter Mischel conducted a series of fascinating studies with children measuring preferences & self-control capacity. In the classic “marshmallow test,” children were offered a choice:
Immediate Option: Receive one marshmallow (or pretzel, or cookie) right now.
Delayed Option: Wait alone in a room for approximately 15 minutes, & receive two marshmallows.
The procedure measured self-control—the ability to delay gratification for a larger reward. There was considerable variability in children’s ability to wait for the preferred reward. Some children immediately chose the single marshmallow. Others waited the full 15 minutes to receive two marshmallows. Many started waiting but gave up partway through, signaling the experimenter to return & accepting the single marshmallow.
Children who had been better able to wait for a marshmallow as preschoolers later exhibited higher SAT scores, better stress management, & more successful life outcomes measured decades later (Mischel, Shoda, & Rodriguez, 1989). This suggested that self-control capacity in childhood predicts important life outcomes—academic achievement, career success, health, & relationship quality. The ability to delay gratification appears to be a critical skill for navigating the modern world where long-term planning produces benefits but requires resisting immediate temptations.
Hypothetical Choice Procedures
A procedure to measure self-control with adults could involve asking people to choose between options: “Which do you prefer, $5000 that you will receive today (Smaller-Sooner; SS) or $7000 that you will receive in 1 year (Larger-Later; LL)?”
By varying the amounts & delays systematically, researchers can determine indifference points—combinations where someone is equally likely to choose either option. These indifference points reveal how steeply a person discounts future rewards. Someone who prefers $5000 today over $7000 in one year discounts the future more steeply than someone who prefers $7000 in one year. We’ll explore the mathematical functions describing this discounting in Module 10.
Titration Procedures for Precise Measurement
A rapid adjusting procedure (also called titration) systematically adjusts the value of one option based on previous choices until an indifference point is found. This identifies the precise value where someone switches from preferring one option to preferring another (Mazur, 1987).
Example titration procedure: Trial 1: Choose between $5000 today or $10,000 in 1 year. Person chooses $10,000 (prefers LL). Trial 2: Delayed amount decreases. Choose between $5000 today or $9000 in 1 year. Person chooses $9000 (still prefers LL). Trial 3: Choose between $5000 today or $8000 in 1 year. Person chooses $8000. Trial 4: Choose between $5000 today or $7000 in 1 year. Person chooses $7000. Trial 5: Choose between $5000 today or $6000 in 1 year. Person chooses $5000 (switches to SS).
The indifference point lies between $6000 & $7000 in this example. Additional trials could narrow this range further. Titration provides precise, individual-specific measures of how much future rewards must exceed immediate rewards to be preferred. This methodology has proven invaluable for studying individual differences in impulsivity, the effects of various interventions on self-control, & how delay discounting relates to real-world behaviors like substance abuse, gambling, & financial decision-making.
The Matching Law as a Description of Choice Behavior
Herrnstein’s Groundbreaking Discovery
Richard Herrnstein (1961) exposed pigeons to two different variable interval (VI) schedules simultaneously, each on a separate key. If the VI schedules were the same, pigeons pecked both keys about equally. But when the schedules differed—say VI 30-second on the left key & VI 90-second on the right key—pigeons allocated their responses proportionally to the reinforcement rates.
In a VI reinforcement schedule, only one response is required to receive reinforcement, but some amount of time must pass before a response is reinforced. A VI 30-second schedule means that, on average, reinforcement becomes available every 30 seconds. The first response after the interval elapses produces reinforcement. This creates a situation where responding faster doesn’t produce more reinforcement—there’s a maximum rate of reinforcement determined by the VI value. But organisms can choose where to allocate their responses when multiple VI schedules operate simultaneously.
Herrnstein discovered a remarkably simple relationship: The percentage of responses allocated to each alternative matched the percentage of reinforcement obtained from that alternative. If 75% of reinforcements came from the left key, pigeons made approximately 75% of their responses on the left key. If reinforcement was equally distributed (50/50), responses were equally distributed. This relationship held across a wide range of VI values & different reinforcement ratios.
The Matching Law states that in a two-choice situation, the percentage of responses made on each alternative will match the percentage of reinforcers received from each alternative. Expressed mathematically: R₁/(R₁+R₂) = r₁/(r₁+r₂), where R represents responses & r represents reinforcers (Herrnstein, 1961).
Graph showing the percent of responses pigeons made on Key A as a function of the percent of reinforcements received on Key A demonstrates perfect matching: the data points fall along a diagonal line from (0,0) to (100,100). When Key A provided 0% of reinforcements, 0% of responses occurred on Key A. When Key A provided 25% of reinforcements, approximately 25% of responses occurred on Key A. This linear relationship held across the full range.
The Matching Law Applies to Humans Too
The Matching Law isn’t limited to pigeons or laboratory settings. Humans show matching in diverse contexts. When people can choose between two slot machines programmed with different reinforcement rates, they allocate responses to match reinforcement received. When teachers distribute attention between two students, the students’ on-task behavior matches the attention received. When given choices between activities providing different rates of reinforcement, people allocate time to match outcomes. The Matching Law appears to describe a fundamental principle of choice behavior operating across species & situations.
Deviations from Perfect Matching
While the Matching Law provides an elegant description of choice, perfect matching doesn’t always occur. Two systematic deviations have been identified:
Undermatching is a deviation from matching in which animals express relative indifference to the alternatives in a two-choice procedure, regardless of the actual differences in reinforcement. Responses are more equally distributed than reinforcements (Baum, 1974).
Example: If one key provides 80% of reinforcements & the other provides 20%, perfect matching predicts 80% of responses on the first key. Undermatching might produce 65% of responses on the first key—still a preference, but weaker than matching predicts. The organism distributes responses more evenly than reinforcement rates would suggest. Undermatching often occurs when switching between alternatives is costly or when discrimination between alternatives is difficult.
Overmatching is a deviation from matching in which animals show a stronger preference for the better alternative than matching predicts. Responses are more extremely distributed than reinforcements (Baum, 1974).
Example: If one key provides 60% of reinforcements, overmatching might produce 75% of responses on that key. The organism shows an exaggerated preference for the better alternative. Overmatching can occur when switching costs are low & alternatives are easily discriminated, allowing efficient exploitation of the richer schedule.
The Generalized Matching Law
To account for systematic deviations from perfect matching, Baum (1974) proposed the Generalized Matching Law: R₁/R₂ = b(r₁/r₂)ᵃ. Where: R₁ & R₂ = responses on alternatives 1 & 2; r₁ & r₂ = reinforcers from alternatives 1 & 2; b = bias parameter; a = sensitivity parameter.
Bias (parameter b) represents a consistent preference for one alternative independent of reinforcement rates. If b > 1, there is bias toward Alternative 1; if b < 1, bias toward Alternative 2; if b = 1, no bias exists (Baum, 1974).
Bias might arise from physical characteristics of alternatives (one key is easier to reach), perceptual factors (one key is brighter), or previous experience (one alternative was previously richer). Bias shifts preferences consistently toward one option regardless of current reinforcement rates.
Sensitivity (parameter a) represents how responsive the organism is to differences in reinforcement rates. If a = 1, perfect matching occurs; if a < 1, undermatching occurs (less sensitivity to reinforcement differences); if a > 1, overmatching occurs (greater sensitivity to reinforcement differences) (Baum, 1974).
Sensitivity reflects how well the organism tracks & responds to reinforcement rate differences. High sensitivity means even small differences in reinforcement rates produce large differences in response allocation. Low sensitivity means the organism distributes responses relatively evenly despite substantial reinforcement rate differences.
The Generalized Matching Law provides a comprehensive framework for describing choice behavior, accommodating both perfect matching (when b = 1 & a = 1) & systematic deviations. By estimating bias & sensitivity parameters for individual organisms, researchers can characterize choice patterns precisely & investigate factors influencing these parameters.
Looking Forward
We’ve explored how choice is measured in laboratory settings, revealing fundamental patterns in how organisms distribute responses among alternatives. The Matching Law describes a simple, elegant relationship: response allocation matches reinforcement distribution. The Generalized Matching Law extends this framework by incorporating bias & sensitivity parameters. In Part 2, we’ll examine theoretical explanations for matching, including optimization theory & momentary maximization, & explore how choice changes under conditions of certainty, risk, & uncertainty.