
2026. 7. 2. · 00:22
McGurk Effect: When the Eyes Change What the Ears Hear
The McGurk effect shows why speech perception is not purely auditory: when the sound and mouth movements conflict, the brain may hear a third syllable. This article traces the landmark 1976 finding, STS evidence, and why audiovisual speech integration matters.
Most of us think speech enters through the ears. The McGurk effect is the neat little trap that shows why that cannot be right. Play someone an audio track of /ba/ while showing a mouth silently shaped for /ga/, and many listeners do not hear either original syllable. They report something like /da/. The brain has treated speech as a joint audiovisual event, then handed consciousness the compromise.
That is the concept for today: speech perception is multisensory. What you hear is partly constrained by what the face is doing.
What the McGurk effect is
The McGurk effect is an audiovisual speech illusion. It happens when the sound track and the visible mouth movements disagree, yet the listener experiences a single fused speech sound. The original landmark paper is Harry McGurk and John MacDonald's 1976 Nature article, "Hearing lips and seeing voices," published in volume 264 of Nature and indexed with DOI 10.1038/264746a0.1
A common demonstration uses auditory /ba/ paired with visual /ga/. The surprising percept is /da/, a syllable that was not present in either channel by itself.2 That is why the effect matters. It is not just a bias in a response button. It changes the experienced sound.

A useful way to say it is this: the brain is not passively recording speech. It is inferring speech. When the ear and eye provide different evidence, the perceptual system tries to settle on the speech category that best explains both.
Why vision has a vote in speech
Speech sounds are fast, noisy, and often partly masked. In daily life, the face supplies information before and during the sound. Lips close before /b/ and /p/. The jaw, tongue position, and mouth opening give clues about what consonant or vowel is coming. Even if you do not consciously lip-read, your brain uses those cues.
The McGurk effect makes this hidden process visible. If vision were merely an optional add-on, mismatched lips would feel odd but would not change what syllable people hear. Instead, the visual stream can pull the auditory percept toward a different phonetic category.
This also explains why the illusion is strongest in ordinary conversational settings. You need a visible speaker, a speech-like sound, and timing close enough for the brain to treat the two signals as one event. If the mouth movement is wildly out of sync, the brain is more likely to reject the combination.
The landmark study's deeper message
Before McGurk and MacDonald, speech perception was often treated as an auditory problem with visual assistance. The 1976 finding forced a stronger claim: for speech, sight and sound are integrated before the final percept is formed.1
That makes the effect different from a classroom curiosity. It gives cognitive neuroscience a clean dissociation between the physical stimulus and the perceived stimulus. The physical audio may be /ba/. The physical video may be /ga/. The percept may still be /da/. When perception differs from the input in a lawful way, researchers can ask which computation produced the law.
The word "illusion" can be misleading here. The brain is doing the same thing it usually does: combining uncertain evidence from multiple senses. The illusion appears because the experimenter has created an unnatural conflict between channels that normally agree.
Where the brain does the binding
A major candidate region is the posterior superior temporal sulcus, usually abbreviated STS. It sits along the temporal lobe and is well placed to receive both auditory speech information and visual biological-motion information such as mouth movement.
The strongest causal evidence comes from a 2010 Journal of Neuroscience study by Beauchamp, Nath, and Pasalar. The researchers first localized each participant's STS speech-integration area with fMRI. They then used single-pulse transcranial magnetic stimulation, or TMS, to temporarily disrupt activity there while participants judged McGurk stimuli.3
TMS over the STS reduced the likelihood of the McGurk percept, while stimulation of a control site did not. The disruption was most effective in a narrow window from about 100 milliseconds before auditory syllable onset to about 100 milliseconds after onset.3 That timing matters because it puts the integration step close to the moment when sound and sight are being matched.

This does not mean the STS is the whole story. The same paper notes that neuroimaging studies have implicated a broader network, including frontal, parietal, auditory, and temporal regions, and that BOLD fMRI alone cannot prove whether a region causes the percept or merely responds to conflict.3 The better lesson is narrower and more useful: STS appears to be a necessary part of the circuit for many McGurk perceivers.
Why some people do not get the illusion
One of the best checks on any neat neuroscience story is individual variability. The McGurk effect has a lot of it. Nath and Beauchamp note that published estimates of susceptibility range from 26% to 98%, and some healthy listeners report the auditory syllable rather than the fused percept.2
Their fMRI study compared McGurk perceivers and non-perceivers. The left STS was the region that showed both a stimulus-condition effect and a susceptibility-group effect. Stronger left STS responses correlated with a higher likelihood of perceiving the illusion.2
That variability is a warning against saying, "the McGurk effect proves everyone hears with their eyes in the same way." A better statement is: the normal speech system often integrates facial and acoustic evidence, but the weighting of those cues differs across listeners and tasks.

What this concept changes
The McGurk effect changes the intuitive boundary between the senses. Vision does not end at seeing, and audition does not end at hearing. For speech, the percept is built from whatever reliable evidence helps identify the speaker's intended sounds.
That has practical consequences. It explains why seeing a speaker's face helps in a noisy room. It explains why bad dubbing feels unstable. It also explains why video calls, masks, and hearing loss can change speech comprehension in ways that are not purely acoustic.
For cognitive neuroscience, the effect is a compact demonstration of a broader principle: perception is not a stack of isolated modules. The brain often solves a single problem by letting multiple sensory systems constrain the answer.
Open debates
The main debate is no longer whether vision can influence speech perception. It can. The harder questions are about mechanism.
First, how early does the integration happen? The TMS timing result points to a fast STS contribution, but fMRI and behavior alone cannot fully separate early perceptual fusion from later decision processes.3
Second, what exactly is being integrated? The system may combine low-level acoustic and visual features, learned speech categories, motor predictions about articulation, or all of these at different moments. The answer may vary by syllable, language, attention, and hearing ability.
Third, why do people differ so much? The susceptibility range suggests that there is no single fixed McGurk profile.2 Some listeners lean heavily on visual speech; others trust the acoustic channel more. The interesting question is what changes those weights.
Landmark paper
Harry McGurk and John MacDonald's 1976 Nature paper, "Hearing lips and seeing voices," is the landmark source for this concept. It established the illusion that now carries McGurk's name and made audiovisual speech integration impossible to ignore.1
Course connection
This concept fits MIT 9.13's hearing and speech module. Lecture 15, "Hearing and Speech," frames human hearing around speech and music and asks how the functional organization of auditory skills is worked out in the brain.4 The McGurk effect is a clean bridge from that auditory module to a larger cognitive-neuroscience lesson: speech is heard by a brain that is also watching.

이 콘텐츠를 둘러싼 관점이나 맥락을 계속 보강해 보세요.