"

07-2: Behavioral Economics & Complex Behaviors

Psychology of Learning

Module 07: Operant Conditioning 2

Part 2: Behavioral Economics and Complex Behaviors

Looking Back

In Part 1, we explored sophisticated theories of reinforcement. Latent learning demonstrated that learning occurs without reinforcement but requires reinforcement for performance. The Premack Principle revealed that activities can reinforce other activities based on preference hierarchies. Response deprivation theory showed even low-probability behaviors can become reinforcers if restricted below baseline levels. Now we examine how organisms make choices among competing reinforcers, how to build complex behavioral sequences, and what happens when learned behaviors conflict with instincts.

Behavioral Economics: Psychology Meets Economics

Behavioral economics is a field that uses principles from both behavioral psychology and economics to predict people’s choices and behaviors. It combines microeconomics (concerned with consumer behaviors) and operant conditioning (concerned with individual behaviors of organisms) (Kagel, Battalio, and Green, 1995).

Behavioral economics is necessary because psychological values (utility) don’t necessarily match mathematical values. The value of $1 to a pauper is far larger than the value of $1 to a rich man. Understanding behavior requires understanding subjective value, not just objective quantities.

Optimization Theory: Maximizing Satisfaction

Optimization theory is a theory of choice behavior stating that people tend to make decisions that maximize their satisfaction. Maximum utility theory maintains that, facing uncertainty, people behave or should behave as if they were maximizing the expectation of some utility function of the possible outcomes (Von Neumann and Morgenstern, 1944).

Imagine deciding to spend a weekend having a “movie marathon.” You set up a list in Netflix and choose 10 movies. Since you enjoy both comedies and dramas, you need to choose how many of each you’ll watch. How do you decide?

The Law of Diminishing Marginal Value

Like anything, your movie viewing is subject to the law of diminishing marginal value. The first drama you watch gives you 5 units of utility. The second drama gives you an additional 4.5 units of utility. The third gives 4 units, the fourth 3.5 units, continuing to decline. If you watched 10 dramas, the final one would only add 0.5 units of utility. The total package of 10 dramas would yield 25 units of utility.

Apparently you prefer dramas because the first comedy you watch only gives you 3 units of utility. Each additional comedy gives less and less additional utility—2.5 for the second, 2 for the third, declining to 0.5 for the tenth. The total package of 10 comedies would yield 15 units of utility.

But here’s the key insight: If you create a package of dramas and comedies, their combined utility can exceed what you’d receive from watching just one type of movie. The optimal package—6 dramas and 4 comedies—yields 30.25 units of utility, exceeding the 25 units from 10 dramas alone. Variety itself has value!

This example seems far-fetched—who has time or ability to calculate all that? Despite this, researchers have found repeatedly that animals and humans do in fact approach optimality in their choices. We may not consciously calculate utilities, but our choices approximate optimal solutions.

Optimization and Behavioral Ecology

From an evolutionary perspective, any animal that doesn’t optimize is less likely to survive and reproduce. Therefore, evolution should favor optimization. Two elegant examples demonstrate this:

Bluegill and Prey Selection: Bluegill fish face choices about which prey to pursue. If there are many prey, an animal should bypass small prey and spend time catching larger prey—the effort-to-reward ratio favors selectivity. If prey is scarce, an animal should pursue any prey available—being picky means going hungry. Research confirmed these predictions precisely: If prey density is high, bluegill eat only the largest prey. If prey density is medium, they eat only medium or large prey. If prey density is low, they eat any prey they find. Their behavior tracks optimal foraging predictions (Werner and Hall, 1974).

Male Dung Flies and Patty Selection: In another study, the optimal time a male dung fly should wait by a fresh cow patty was calculated. Female flies prefer fresh patties. It takes time for males to find new patties. There’s a tradeoff between loss of time trying to find a new patty balanced against the benefit a fresher patty might provide. Male dung fly behavior was near optimal—they left patties at almost exactly the predicted time (Parker, 1970). Natural selection fine-tuned their decision-making.

Elasticity and Inelasticity of Demand

Elastic demand is demand for a product that exhibits large changes as the price increases or decreases. If an FR10 schedule increases to an FR100, animals will decrease responding if demand for the reinforcer is elastic. Demand is elastic when close substitutes for a reinforcer are available (Hursh, 1980).

Rats show exactly these changes if alternate food sources are or are not made available. When free food is accessible elsewhere, rats dramatically reduce lever pressing as the ratio requirement increases—food demand is elastic because substitutes exist.

Inelastic demand is demand for a product that shows relatively little change as the price increases or decreases. If an FR10 increases to an FR100, animals will continue responding at similar rates if demand for the reinforcer is inelastic. Demand is inelastic when there are no close substitutes for a reinforcer available (Hursh, 1980).

When no alternate food sources exist, rats maintain responding even as ratio requirements increase dramatically. They’ll work harder to obtain the only food available. This mirrors human behavior—demand for necessities without substitutes (insulin for diabetics, gasoline when no public transit exists) remains relatively stable despite price increases.

Complex Behaviors: Chaining

Shaping can train an animal to execute a single behavior they don’t regularly exhibit. In shaping, the trainer identifies the desired target behavior and reinforces successive approximations toward the goal behavior. But many behaviors require executing series of actions in specific orders.

Chaining is an extension of behavioral shaping. Chaining is the reinforcement of successive elements of a chain of behaviors. Shaping focuses on reinforcing a target behavior. Chaining focuses on reinforcing series of behaviors that must be executed in specific order.

Successful tooth brushing requires an individual to hold the toothbrush, apply appropriate amount of toothpaste, brush all surfaces of the teeth, spit, rinse, and wipe the mouth. Spitting out excess toothpaste before brushing all surfaces would compromise effectiveness of the entire process. Wiping the mouth before brushing, not afterwards, would defeat the purpose of this step. Tooth brushing is an example of a chain of behaviors that must be executed in specific order.

Forward Chaining: Learning from Beginning to End

Forward chaining is a technique for conditioning the acquisition of a series of behaviors that must be demonstrated in a particular sequence. Behaviors are learned in sequence starting with the first behavior in the series. Reinforcement follows the successful demonstration of the first behavior, then the successful demonstration of the first and second behavior, and so on (Skinner, 1938).

When trying to teach a chain of behaviors that is long, the burden on the learner gets progressively more difficult as each step/skill is added. The longer the chain, the more work (mental or physical) required. Additionally, interference can occur when previously learned steps interfere with learning the next step to add to the process.

Bed Making Example: Consider teaching a child to make the bed with these steps: 1) Remove the pillows. 2) Pull up the top sheet. 3) Tuck in the edges of the top sheet. 4) Pull up the bed cover/blanket/comforter. 5) Smooth out any wrinkles. 6) Place the pillows neatly at the head of the bed.

Using forward chaining, a parent could reinforce a child for removing the pillows consistently each morning. Then, reinforce the child to remove the pillows and pull up the sheets. Reinforcement would occur only after both behaviors were executed in the proper sequence. After mastering the first two steps, reinforce only after the child removes the pillows, pulls up the top sheet, and tucks in the edges. The process of adding behaviors would continue until all steps were learned.

Backward Chaining: Learning from End to Beginning

Backward chaining is a technique for conditioning the acquisition of a series of behaviors that must be demonstrated in a particular sequence. Behaviors are learned in sequence starting with the last behavior in the series. Reinforcement follows the successful demonstration of the last behavior, then the demonstration of the next to last and last behavior, and so on. This technique is believed to be easier for individuals to learn long sequences because the newest behavior is demonstrated first in the sequence (Skinner, 1938).

Building on the bed making example, a parent could condition a child using backward chaining. The parent could complete the first five steps and require the child to complete the last step (placing pillows). Upon successful completion, the caregiver would reinforce the child. This routine would continue until the child reliably completed the last step.

Then, the caregiver would alter actions by completing only the first four steps and requiring the child to complete steps five and six (smoothing wrinkles, placing pillows) in proper order before receiving reinforcement. The process of adding a step to the beginning of the series would continue until the child was able to execute all steps properly and in desired sequence.

Forward chaining is the more intuitive way to teach. But if teaching step 4 in the chain, the animal must remember how to do the new step while completing the first three. Some argue it’s too cognitively demanding. Backward chaining is often better for complex skills because the animal is always performing the “new” behavior first—it’s fresh in memory and hasn’t been obscured by performing other steps. However, some skills, like diving, cannot be taught backward.

Instinctive Drift: When Learning Conflicts with Instinct

Instinctive drift is a term used to refer to animals’ tendencies to engage in instinctual behaviors despite conditioning to learn incompatible behaviors. Learned behaviors drift toward instinctual patterns (Breland and Breland, 1961).

Keller and Marian Breland, former students of Skinner, founded a business training animals for commercials and entertainment. They encountered a puzzling phenomenon. They trained raccoons and pigs to pick up tokens and deposit them in piggy banks to receive food reinforcement. Initially, animals performed well. But over time, behavior deteriorated.

Raccoons began rubbing tokens together, dipping them in the bank repeatedly, refusing to let go—despite no food reinforcement for these behaviors. Pigs began rooting tokens along the ground, tossing them in the air, rooting them again. These behaviors delayed or prevented reinforcement, yet animals persisted.

A fixed action pattern is an instinctual series of behaviors that are demonstrated in their entirety in response to some environmental stimulus. These patterns are innate, species-specific, and triggered by specific releaser stimuli (Lorenz, 1937).

Normally, the sight of food triggers fixed action patterns in animals. But because tokens had become associated with food through classical conditioning, the sight of tokens now triggered fixed action patterns in raccoons and pigs. Raccoons’ “washing” behavior and pigs’ rooting behavior are innate food-handling patterns. Instinctive drift demonstrates that learning has biological constraints—operant conditioning cannot override powerful instinctive behaviors. This has important implications for behavior therapy: habitual behaviors may function almost like learned instincts, resisting change through conditioning alone.

Avoidance Learning: Escaping Fear Itself

Avoidance learning is the demonstration of learned avoidance behaviors in response to people, environments, or situations that induce fear or anxiety. Organisms learn to avoid stimuli that predict aversive outcomes (Solomon and Wynne, 1953).

Classic experiments demonstrated avoidance learning: Dogs stood in a compartment. A light signaled onset of an electric shock. Dogs could jump over a barrier into another compartment to avoid being shocked. Negative reinforcement (shock termination) increased probability dogs would jump and avoid shock in the future. After this behavior was established, experimenters shut off the electricity. The behavior continued! If this is avoidance learning, what is being avoided if the shock is gone?

Mowrer’s two-factor theory is an explanation of avoidance learning based on the idea that individuals or animals first learn to fear a previously neutral stimulus through classical conditioning. Then, the individual or animal escapes the stimulus to avoid feelings of fear. The avoidance behaviors are maintained through negative reinforcement—removal of fear (Mowrer, 1947).

The “two factors” are classical conditioning and operant conditioning working together:

Factor 1 (Classical Conditioning): Light (CS) is paired with shock (US), producing fear (CR). The neutral light becomes a fear-eliciting stimulus.

Factor 2 (Operant Conditioning): Jumping over the barrier (R) terminates the light (removes CS), which removes fear (negative reinforcement). This strengthens the jumping response.

Even when shock is removed, the light still elicits fear (classical conditioning persists), and jumping still removes fear (negative reinforcement continues). The behavior maintains despite shock never occurring again. This is negative reinforcement—removing the aversive stimulus (fear), not avoiding shock itself.

Clinical Example: Agoraphobia

Agoraphobia and panic disorder often coexist. If someone is on a bus and has a panic attack, they associate buses with extreme anxiety—this is classical conditioning. The person avoids going on buses, thus avoiding the anxiety and being negatively reinforced—this is operant conditioning. After having panic attacks at the store, the park, in the car, etc., pretty soon the person avoids all places and agoraphobia develops. Avoidance behaviors are maintained by fear reduction, making them extremely resistant to extinction. The person never discovers that the feared outcome (panic attack) might not occur because they never test the situation.

Looking Forward

We’ve explored behavioral economics, showing that organisms optimize choices to maximize utility and adjust behavior based on elasticity of demand. We examined complex behavior acquisition through forward chaining and backward chaining, discovered instinctive drift—learned behaviors drifting toward instinctual patterns—and explored avoidance learning through Mowrer’s two-factor theory. In Part 3, we’ll examine factors influencing the effectiveness of consequences—satiation, immediacy, contingency, and cost-benefit analysis—along with factors affecting punishment effectiveness and behavioral decelerators used in behavior therapy.

License

Psychology of Learning TxWes Copyright © by Jay Brown. All Rights Reserved.