Skip to content

Real Sound

The issue of authenticity often comes up when discussing audio sources for listening lessons — teachers generally feel (rightly) that  the audio and video they use in the classroom should be “natural” and “authentic,” meaning that it should reflect language that learners will encounter in authentic situations outside the classroom.  However, it’s important to reflect on the terms “real” and “authentic,” and what they actually mean in today’s media landscape.

Take the case of ambient sound at sports events: the thump of a gymnast landing on the beam or the thunder of horses hooves at the racetrack. What could be more real than that?  Hearing those sounds helps us see and understand the dynamics of the sporting event.

But not so fast.  In a documentary “The Sound of Sport,” Peregrine Andrews shows how the sounds in sporting events are enhanced or even created, from putting microphones on gymnast balance beams, archery targets, or the athletes themselves, to adding stock sound clips from a completely different source, all in the name of creating more “authenticity” for the viewer watching on TV at home.  The problem is that — either based on our experience with real life or the acoustic training we get from watching movies where sound effects are routinely added — if we see an arrow hit a target, we also expect to hear it.  As On the Media host Brook Gladstone remarks, “We prefer to hear what we expect to hear.”

In reality, what we see at home can be quite different from what we would see if we were at the event. In a blog post, Andrews writes

If we’re actually at an event, what we hear will probably be very different from what the audience at home hears. At the event we might hear little more than the crowds around us, whereas the TV audience will be delivered a manufactured soundtrack created from many elements, just as it is in a drama or a film.

On the Media: What You Hear When You Watch The Olympics
Blog post: Peregrine Andrews on the Sound of Sport: What is real?

Implications for L2 Listening

Just because something is “authentic” doesn’t mean that it is unaltered and completely reflecting reality.  The truth is that a majority of the audio we hear now is at least somewhat scripted, rehearsed, edited or produced.  Why? Because otherwise we would find it difficult to listen to and/or boring.  So just be aware that when you hear something “real,” it may not be as real as you think.

Out of the mouths of babes

The prosodic features of English — intonation and rhythm — are the “road signs” for both comprehension and compressibility.  Learner trained in prosodic features have better listening comprehension, and the ability to use prosody is key to listeners being able to understand the speaker.  (Gilbert)

What are these babies saying?
It is striking how easy it is to imagine the content even when there are no words. It’s also obvious that they are “speaking” English.

Implications for L2 Listening

Discussion of rhythm and intonation is sometimes relegated to pronunciation classes, but recognition and understanding of prosodic features in English are key to comprehension. Different languages have differing prosodic features, and so important features in English may not be immediately obvious or salient to a learner. Explicitly teaching prosodic features can help learners become better listeners.


Gilbert, J. B. (2008). Teaching pronunciation : using the Prosody Pyramid. Cambridge, UK; New York: Cambridge University Press.

The importance of schema

Schema is an abstract, generalized mental representations of our experience that help us understand new experiences. It includes content schema, which is knowledge about a topic, and discourse schema, which is knowledge of how a type of discourse works.  In listening, schema helps us interpret and assign meaning to the speech we hear.  For example, at a birthday party, if you hear someone say “ha…,” your birthday party schema will help you guess that the speakers is beginning to say “happy birthday.”

There is a great illustration of schema from Bransford and Johnson:

The procedure is actually quite simple. First you arrange things into different groups. Of course, one pile may be sufficient depending on how much there is to do. If you have to go somewhere else due to lack of facilities that is the next step, otherwise you are pretty well set. It is important not to overdo things. That is, it is better to do too few things at once than too many. In the short run this may not seem important bu complications can easily arise. A mistake can be expensive as well. At first the whole procedure will seem complicated. Soon, however, it will become just another facet of life. It is difficult to foresee any end to the necessity for this task in the immediate future, but then one never can tell, After the procedure is completed one arranges the materials into different groups again. Then they can be put into their appropriate places. Eventually they will be used once more and the whole cycle will then have to be repeated. However, that is part of life. (p. 722)
Not sure what this passage is about?  Click here for a hint that will activate your schema.

Now look at this…
I love this ad as an example of the importance of schema in listening comprehension. Notice how your interpretation of the story changes at the end of the ad: KAYAK.COM – GRANDFATHER AD

Implications for L2 Listening

From the illustrations above, you can see how a listener’s background knowledge contributes to comprehension.  Schema helps us predict content, interpret word meaning, and make sense of discourse. Pre-listening activities — such as looking at pictures, reviewing known information about a topic, brainstorming vocabulary — can help learners “activate” their schema and be better prepared to understand what they hear.

Bransford, J.D., & Johnson, M.K. (1972). Contextual prerequisites for understanding: Some investigations of comprehension and recall. Journal of Verbal Learning and Verbal Behavior, 11, 717-726.

Human voice recognition and language ability

A study published in Science (Perrachione, Del Tufo, & Gabrieli, 2011) looked at the ability of people with and without dyslexia to recognize voices speaking in the participants’ native language (English) versus a language they didn’t know (Mandarin Chinese). The researchers were attempting to test whether dyslexia is caused in part by difficulty with phonological processing (the ability to understand the individual sounds of language), which would then interferes with reading.  To test this, they had people listen to five speakers and later try to match the voices with the faces.

Their results showed that English speakers with dyslexia had a more difficult time identifying voices of English speakers (about 50% accuracy compared to about 70% for non-dyslexics).  However, both dyslexics and non-dyslexics performed about the same identifying the voices of Chinese speakers (~50%).  This means that non-dyslexics had an advantage over dyslexics in identifying voices in their native language, but both groups performed about the same in the unfamiliar language.   (The authors note that non-dyslexic native Chinese speakers displayed the same advantage in identifying the voice of someone speaking their native language.)

Implications for L2 Listening

The authors point out the link between voice recognition ability and “the ability to compute the differences between incidental phonetics of a specific vocalization [e.g. the way a word is actually pronounced] and the abstract phonological representation of the words that vocalization contains [i.e. how the sound of the word is stored in the brain].” In other words, the ability to recognize a voice is connected to the ability to recognize the variations in how individuals pronounce specific sounds.  To understand this, think of some obvious variations in how people speaker: one person may pronounce vowel sounds with more of a nasal sound, while another person my lisp slightly. In fact, each speaker pronounces speech sounds in a slightly different way, and while we may not consciously recognize this difference, it is what allows us to recognize the voices of different people.

The study showed that speakers who are unfamiliar with a language are not as good at identifying voices because their ‘abstract phonological representation’ of the sounds in the language is not well-developed. Dyslexic subjects also had a harder time identifying voices because their “phonological representations are compromised” — in both cases, the subjects could not recognize the small differences that help us identify in individual voice.

This is important to keep in mind for language learning because when learning a new language, it takes time to develop the “abstract phonological representation of words” in the learner’s mind.  This means that learners may not hear a word correctly if it doesn’t match the representation they have in their mind — not just on the word level of knowing how different words sound, but also on the phonological level of hearing the right sounds.  Many pronunciation problems are actually perception problems when someone doesn’t hear a phoneme correctly, perhaps because that sound doesn’t exist in the learner’s native language, or it is an allophone for another sound.

Perrachione, T. K., Del Tufo, S. N., & Gabrieli, J. D. E. (2011). Human Voice Recognition Depends on Language Ability. Science333(6042), 595–595. doi:10.1126/science.1207327