"

02-3: Experimental Design & Statistical Analysis

Psychology of Learning

Module 02: Research Methods

Part 3: Experimental Design & Statistical Analysis

Looking Back

In Parts 1 and 2, we established the scientific foundations of learning research and the logic of experimental control—identifying independent and dependent variables, distinguishing extraneous variables from confounds, and appreciating how random assignment controls for both known and unknown variables. Now we’re ready to put these concepts together into complete experimental designs: How do researchers decide whether to use two groups or multiple groups? When should they use independent groups versus repeated measures? How do they analyze data from different designs?

In this chart, the appropriate statistic to analyze experimental data can be determined by answering a series of questions

What Is an Experimental Design?

An experimental design is a general plan for selecting participants, assigning participants to experimental conditions, controlling extraneous variables, and gathering data. The design you choose depends on your research question, practical constraints, and what type of conclusions you want to draw (Keppel & Wickens, 2004).

Choosing the right experimental design isn’t arbitrary—it flows logically from your research questions and the nature of your variables. Different designs allow different conclusions and require different statistical analyses. Let’s examine the major categories of experimental designs, using Nancy’s evolving caffeine research as our continuing example.

Two-Group Designs

The simplest experimental design compares two groups: one receiving the treatment (experimental group) and one not receiving it (control group). But even this simple design comes in two flavors depending on how groups are formed.

Two Independent-Groups Design

A two independent-groups design is a research design with two groups of participants that are formed by random assignment. Each participant appears in only one group, and the groups are unrelated (Fisher, 1935).

Nancy’s first caffeine experiment used this design. She had one IV (caffeine) with two levels (caffeine vs. no caffeine). She randomly assigned mother rats to conditions, making the pup groups essentially randomly assigned as well. The groups were independent because she used none of the methods for creating related groups: no repeated measures (pups weren’t tested in both conditions), no matched pairs (she didn’t match pups before assignment), and no natural pairs (all pups from a litter were in the same group).

Nancy analyzed her data using a t-test for independent-samples—an inferential statistic used to evaluate the difference between two means from randomly assigned groups. This test determines whether the difference between groups is larger than would be expected by chance (Fisher, 1925).

The t-test compares the difference between group means to the variability within groups. A large differstatistical significanceence between groups combined with small variability within groups produces a large t-value and statistical significance. Significance means the observed difference is unlikely to result from chance alone, supporting a causal relationship between the IV and DV.

Two Related-Groups Design

A two related-groups design is a research design with two groups of participants that are formed using repeated measures, matched pairs, or natural pairs. Participants in the two groups are related in some systematic way (Keppel & Wickens, 2004).

Although Nancy used independent groups, she could have used a related groups design. For example, she might have tested each rat twice—once with caffeine and once without (in counterbalanced order). This repeated measures approach eliminates individual differences as a source of variability because each rat serves as its own control.

Data from two related groups are analyzed with a t-test for related-samples (also called paired-samples t-test or dependent t-test)—an inferential statistic that accounts for the correlation between groups. Because related groups are more similar than independent groups, this test has greater statistical power (Fisher, 1925).

Nancy’s First Experiment: Results & Interpretation

Suppose Nancy runs her experiment, analyzes the data with a t-test for independent samples, and finds no effect of caffeine on how quickly rats learn to bar press. She’s surprised because background research showed caffeine increases arousal and task performance. Reviewing the literature, she notices researchers have used varying caffeine concentrations. Perhaps her concentration (.017%) was inadequate to affect learning?

This illustrates an important limitation of two-group designs: they can only tell you whether two specific conditions differ. They can’t reveal dose-response relationships or compare multiple treatment variations. When Nancy suspects her caffeine dose might be wrong, a two-group design is insufficient. She needs a multiple-group design.

Multiple-Group Designs

When research questions involve comparing more than two conditions, multiple-group designs are necessary. These designs can reveal dose-response relationships, compare several treatment variations, or include multiple control conditions.

Multiple Independent-Groups Design

A multiple independent-groups design is a research design with more than two groups of participants that were formed by random assignment. This design allows comparison of three or more treatment levels or conditions (Keppel & Wickens, 2004).

For her second experiment, Nancy decides to use varying caffeine concentrations. Because she found no difference in the first experiment, she sees no need for a control group. Instead, she’ll compare three caffeine levels: .017% (her original concentration), .034% (twice as much), and .05% (three times as much). This multiple independent groups design can reveal whether effects depend on dosage.

One-Way ANOVA for Independent-Groups

One-way ANOVA (Analysis of Variance) for independent groups is a statistical test used to analyze data from an experimental design with one IV that has three or more groups formed by random assignment. ANOVA determines whether significant differences exist anywhere among the group means (Fisher, 1925).

ANOVA compares variance between groups to variance within groups. If the IV has an effect, variance between groups will be larger than variance within groups, producing a significant F-ratio. The term “one-way” indicates one IV; if there were two IVs, we’d use “two-way ANOVA.”

An important limitation: ANOVA tells you that differences exist somewhere among the means but doesn’t specify which groups differ from which. With three groups, a significant ANOVA could mean Group 1 differs from Group 2, or Group 1 differs from Group 3, or Group 2 differs from Group 3, or any combination. This is where post hoc tests become necessary.

Post Hoc Tests

A post hoc test (Latin for “after this”) is a statistical comparison made between group means after finding a significant ANOVA. Post hoc tests determine which specific groups differ from each other (Tukey, 1949).

Tukey's HSD (Honestly Significant Difference) is a commonly used post hoc test that controls for multiple comparisons. It compares every possible pair of means while maintaining the appropriate error rate (Tukey, 1949).

Nancy conducts her second experiment and finds a significant ANOVA. Post hoc tests reveal that the .05% group learned to bar press significantly more slowly than the other two groups. Now she has evidence that high caffeine doses may inhibit learning in rats.

Factorial Designs: Multiple Independent Variables

So far, all designs we’ve discussed have one IV. But many research questions require manipulating multiple IVs simultaneously. Does caffeine affect learning? Does this effect change over time? Do different learning tasks show different patterns? These questions require factorial designs.

A factorial design is an experimental design with more than one IV. “Factorial” simply means the design has multiple factors (IVs). Factorial designs are immensely powerful because they can reveal interactions between IVs—situations where the effect of one IV depends on the level of another IV (Fisher, 1925).

Nancy’s Third Experiment: Adding a Second IV

Nancy now has evidence that high caffeine doses inhibit learning. But reviewing the literature, she finds caffeine has differing effects on locomotor activity over time—early increases followed by later decreases. Perhaps her bar pressing results reflected activity levels as much as learning? She decides to measure learning over time.

Nancy returns to a simple design with just two caffeine groups (control and experimental), but uses the higher .05% dose that showed effects. She adds a second IV: learning measured over time across repeated trials. This addresses whether caffeine effects change as learning progresses.

Identifying Factorial Designs

How do we identify and describe factorial designs? By the number of IVs and the number of levels of each IV. Nancy’s design has two IVs: caffeine (two levels: control vs. experimental) and trials (eight levels: Trials 1-8). This is a 2 × 8 factorial design (read “two by eight”). The first number indicates the levels of the first IV, the second number the levels of the second IV.

A design with three IVs would be described with three numbers (e.g., 2 × 3 × 4). The numbers multiply to give total conditions: Nancy’s 2 × 8 design has 16 conditions (2 caffeine levels × 8 trials).

Between-Subjects, Within-Subjects, & Mixed Designs

Factorial designs can be classified by how participants are assigned to levels of each IV:

Between-subjects factors use independent groups—different participants for each level. Nancy’s caffeine IV is between-subjects because different rats receive caffeine versus no caffeine.

Within-subjects factors use related groups—the same participants experience all levels. Nancy’s trials IV is within-subjects because each rat is measured across all eight trials (repeated measures).

A factorial mixed-groups design has at least one between-subjects IV and at least one within-subjects IV. Nancy’s design is mixed: caffeine is between-subjects, trials is within-subjects (Keppel & Wickens, 2004).

Factorial ANOVA

Factorial ANOVA for mixed groups is an inferential statistical test used to analyze data from a factorial mixed-groups design. It tests main effects (effects of each IV separately) and interactions (whether IVs combine in non-additive ways) (Fisher, 1925).

A main effect is the overall effect of one IV, averaged across levels of other IVs. Nancy’s analysis will test for a main effect of caffeine (do caffeine and control groups differ overall?) and a main effect of trials (does performance change across trials?).

An interaction occurs when the effect of one IV depends on the level of another IV. In Nancy’s study, an interaction would mean caffeine effects differ across trials—perhaps caffeine initially enhances learning but later impairs it. Interactions are often the most interesting findings because they reveal complex relationships between variables.

Nancy’s Third Experiment: Results

Nancy runs the factorial experiment and analyzes bar pressing and maze running separately with factorial ANOVAs for mixed groups. The results reveal complex patterns:

Bar pressing: Rats made more bar press responses over eight trials—learning improved over time (main effect of trials). However, caffeine rats made fewer bar presses than control rats beginning on Trial 3 through the end (interaction between caffeine and trials). For bar pressing, caffeine inhibited learning, but this effect emerged only after initial trials.

Maze running: Caffeine rats took less time to complete the maze on Trials 1-4 (significantly so on Trial 4). Results then reversed—control rats took less time on Trials 5-8 (significantly on Trials 5, 6, and 7). This clear interaction shows caffeine initially enhanced but later impaired maze learning.

These findings illustrate the power of factorial designs. A simple two-group design would miss the time-dependent nature of caffeine effects. The factorial design revealed that caffeine’s impact changes as learning progresses—an interaction that answers Nancy’s question about whether activity changes over time affect learning.

Additional Design Considerations

Beyond the basic experimental designs we’ve covered, researchers employ several additional techniques and design variations to address specific research challenges.

Pilot Studies

A pilot study is a small-scale preliminary study conducted before the main experiment. Unlike the main experiment, which tests the research hypothesis, a pilot study tests the methodology itself. Researchers use pilot studies to identify problems with procedures, refine operational definitions, estimate effect sizes for power analysis, and train research assistants. Issues discovered during piloting—confusing instructions, equipment malfunctions, ceiling or floor effects—can be corrected before investing resources in the full study (Leon, Davis, & Kraemer, 2011).

Manipulation Checks

A manipulation check is a measure used to verify that the independent variable manipulation was effective. For example, if you’re studying how anxiety affects learning, you need to confirm that your anxiety induction actually made participants anxious. Without a manipulation check, null results are ambiguous—did the IV truly have no effect, or did the manipulation simply fail? Manipulation checks are especially important when IVs involve psychological states that aren’t directly observable (Sigall & Mills, 1998).

Quasi-Experimental Designs

A quasi-experiment resembles a true experiment but lacks random assignment to conditions. This occurs when researchers study pre-existing groups (e.g., comparing learning in children with and without ADHD) or when random assignment is impractical or unethical. Without random assignment, groups may differ systematically before the study begins, threatening internal validity. Quasi-experiments can suggest causal relationships but cannot establish them as definitively as true experiments (Shadish, Cook, & Campbell, 2002).

Single-Subject Designs

Single-subject designs (also called single-case or N=1 designs) intensively study individual participants rather than comparing group averages. These designs are particularly appropriate when studying rare conditions where large samples are impossible, when individual responses to treatment vary greatly, or in applied settings like clinical practice. The participant serves as their own control through repeated measurement across baseline and treatment phases. Single-subject designs have been fundamental to applied behavior analysis and remain important in learning research (Kazdin, 2011).

Measurement, Reliability, & Validity

Before concluding, we must address measurement quality. Even the best experimental design produces meaningless results if measurements are poor.

Measurement error refers to inaccuracies in measurement. All measurements include truth plus error. We make multiple measurements so errors balance out, revealing the true pattern (Nunnally & Bernstein, 1994).

Reliability is the ability to measure the same thing consistently over repeated measurements. A reliable measure produces similar results under similar conditions. Reliability is often assessed using correlations between repeated measurements (Cronbach, 1951).

As can be seen in this analogy image, low reliability & high validity cannot exist.

Validity is the ability to measure the construct you intended to measure. A valid measure actually assesses what it claims to assess. All DVs have some reliability and validity, but some are better than others (Campbell & Fiske, 1959).

Generalization & Transfer from the Laboratory

Generalization is applying results from an experiment to different situations or populations. Do findings from laboratory studies with rats apply to humans? Do results from college students apply to other populations? These questions concern external validity—whether results generalize beyond the specific study (Campbell & Stanley, 1963).

Results from learning experiments using animals often do apply to humans—the results generalize. Basic principles of classical and operant conditioning show remarkable similarity across species. However, as we learned in Module 01, biological constraints mean generalization isn’t automatic. Researchers must demonstrate generalizability empirically rather than assuming it.

There’s often tension between internal and external validity. Highly controlled laboratory studies maximize internal validity (confidence in causal conclusions) but may sacrifice external validity (generalizability to real-world settings). Field studies in natural settings increase external validity but often reduce control, potentially threatening internal validity. Researchers must balance these competing goals (Berkowitz & Donnerstein, 1982).

Not all numbers are equal. Numbers can contain up to four properties.

Looking Forward

We’ve completed our introduction to research methods in learning—understanding the scientific foundations, research equipment, experimental variables, control techniques, and the major experimental designs. In Module 03, we’ll apply these research methods to studying our first specific type of learning: unlearned adaptive behaviors including reflexes, habituation, and sensitization, which provide the foundation for understanding more complex learning processes.

Media Attributions

definition

License

Psychology of Learning TxWes Copyright © by Jay Brown. All Rights Reserved.