You can practice for years and barely improve. Or you can practice for weeks and transform how you speak. The difference isn’t talent, and it isn’t time. It’s feedback.
In the 1970s, psychologist Richard Schmidt was studying how people learn physical skills. He had participants throw beanbags at targets they couldn’t see. One group got feedback after every throw. Another got feedback after every fifth throw. A third got no feedback at all.
The results weren’t subtle. The immediate feedback group dramatically outperformed everyone else and by a significant margin. When feedback was removed entirely, the no-feedback group showed almost no learning at all, no matter how much they practiced.
This finding has been replicated across dozens of motor skills, from golf swings to surgical procedures. The principle is consistent: without feedback, practice doesn’t make perfect. On the contrary, practice makes permanent. Whatever you’re doing, right or wrong, gets ingrained.
But what does this have to do with pronunciation?
Speaking involves coordinating over 100 muscles in precise timing sequences. Your tongue position, lip rounding, jaw height, vocal cord tension, and airflow all have to work together within milliseconds. Neuroscientist Frank Guenther’s DIVA model of speech production shows that we learn these coordinations through a kind of internal trial-and-error, where the brain compares what we intended to say with what we actually produced.
During babbling, infants use auditory feedback to tune these neural mappings. They hear themselves, compare the output to their target, and adjust. Adults learning new sounds need to do something similar, but there’s a problem: by the time you’re an adult, your internal monitoring is calibrated for your native language’s phoneme categories. You literally can’t hear some of the errors you’re making.
This is why external feedback matters so much. Your ears are lying to you, and you need something outside yourself to tell you the truth.
But not all feedback is created equal. Research distinguishes between types.
Motor learning research distinguishes between two kinds of feedback:
Knowledge of Results (KR) tells you whether you achieved the goal. Did you pronounce the word correctly? Yes or no.
Knowledge of Performance (KP) tells you how you moved while trying. Your tongue was too far forward. You didn’t round your lips enough. The vowel was too short.
Research comparing these approaches found that combining KR with specific KP outperformed either alone. Just telling someone “that was wrong” isn’t enough. Telling them why it was wrong, what specifically to adjust, that’s what accelerates learning.
For pronunciation, this means phoneme-level feedback beats binary “correct/incorrect” judgments. Knowing that you nailed the consonant but missed the vowel gives you something to work with.
This raises an obvious question: if feedback is so important, why doesn’t everyday conversation improve our pronunciation?
Here’s the uncomfortable truth: you can have thousands of conversations in English and never improve your pronunciation. Conversation provides feedback, but it’s the wrong kind.
Conversation feedback is delayed, often by seconds or never at all. It’s binary at best, you were either understood or you weren’t. It’s inconsistent, depending on your conversation partner’s tolerance and attention. And there’s social pressure to keep moving, not to stop and repeat a word five times.
Most importantly, conversation feedback is focused on meaning, not form. If someone understands you, you got the feedback that matters for communication. But you got zero feedback on whether your “th” sounded like “th” or “s” or “f.” As long as comprehension happened, nobody tells you.
This is why immersion alone doesn’t fix pronunciation. You can live in an English-speaking country for decades and still have the same accent you arrived with. The input is there, but the corrective feedback isn’t.
So more feedback is better, right? Not exactly.
Interestingly, research also shows that constant feedback isn’t optimal for long-term learning. Salmoni, Schmidt, and Walter found that learners who received feedback on every single attempt performed well during practice but poorly on tests without feedback. They had become dependent on the external signal.
The sweet spot is frequent feedback during initial learning, gradually reduced as the skill becomes automatic. You don’t want to need external feedback forever. You want to develop internal monitoring, the ability to feel when your production matches the target without anyone telling you.
This happens through repeated cycles of attempt, feedback, and adjustment. Over time, you internalize the correct motor pattern. You start to feel when the “th” is right, even before you hear yourself say it. The external feedback trains your internal monitor, then becomes unnecessary.
Anders Ericsson spent his career studying expert performers. Chess players, surgeons, musicians. He found that what separates effective practice from ineffective practice isn’t time spent. It’s how that time is structured.
Deliberate practice requires specific goals, that is “get better at pronunciation” but “master the ‘th’ sound in these five words.” It requires immediate feedback, and it means targeting your weaknesses instead of reviewing what you already know. You also have to actually pay attention, not half-listen while you think about dinner.
Most pronunciation “practice” fails on all four counts. You speak without specific targets, get no feedback, repeat randomly, and half-attend while thinking about something else. That’s not practice. That’s just talking.
If you want to improve, you need feedback that is immediate, specific, and consistent. You need to know within a second or two whether you got it right. You need to know which part was off. And you need the same error to get the same feedback every time, not praise one day and silence the next.
Conversation doesn’t provide this. Most teachers can’t provide this for every utterance. But technology can.
The gap between feedback-rich practice and feedback-poor practice is enormous. All those hours of “practice” without feedback weren’t practice at all. They were just repetition, and repetition without correction is how bad habits become permanent.
This is why we built SpeechLoop. Not because the world needs another language app, but because I spent years practicing wrong. I’d have conversations, feel good about myself, and never know that my “th” still sounded like “s” to native speakers. What I needed was something that would tell me immediately, on every attempt, whether I’d hit the target or missed. That’s what SpeechLoop does: phoneme-level feedback in under two seconds, every time you speak. The algorithm Schmidt was describing in the 70s, applied to the thing that actually matters.
Schmidt, R. A. (1975). A schema theory of discrete motor skill learning. Psychological Review, 82(4), 225–260.
Salmoni, A. W., Schmidt, R. A., & Walter, C. B. (1984). Knowledge of results and motor learning: A review and critical reappraisal. Psychological Bulletin, 95(3), 355–386.
Guenther, F. H. (2006). Cortical interactions underlying the production of speech sounds. Journal of Communication Disorders, 39(5), 350–365.
Guenther, F. H., & Vladusich, T. (2012). A neural theory of speech acquisition and production. Journal of Neurolinguistics, 25(5), 408–422.
Early access offer
speechloop gives you phoneme-level feedback on your pronunciation using AI and spaced repetition. Sign up now and get lifetime free access when we launch.
$50/yr $0 forever
You're in! Lifetime free access locked in.
We'll email you when speechloop is ready to download.