5 Gilmore, A. (2015). Research into practice: The influence of discourse studies on language descriptions and task design in published ELT materials. Language Teaching, 48(4), 506–30.
6 Kmiecik, K., & Barkhuizen, G. (2006). Learner attitudes towards authentic and specially prepared listening materials: A mixed message. Tesolanz Journal, 14, 1–15.
7 Long, M. H. (2007). Problems in SLA. Mahwah, NJ: Erlbaum.
8 Matsumoto, H. (2007). Peak learning experiences and language learning: A study of American learners of Japanese. Language, Culture and Curriculum, 20(3), 195–208.
9 McConachy, T., & Hata, K. (2013). Addressing textbook representations of pragmatics and culture. ELT Journal, 67(3), 294–301.
10 McKay, S. (2012). Authenticity in the language teaching curriculum. In C. A. Chapelle (Ed.), The encyclopedia of applied linguistics (pp. 1–4). Chichester, England: John Wiley.
11 MLA Ad Hoc Committee on Foreign Languages. (2007). Foreign languages and higher education: New structures for a changed world. Profession, 2007, 234–45.
12 Richards, J. C. (2006). Materials development and research—making the connection. RELC Journal, 37(1), 5–26.
13 Singleton, D. (2014). How useful is second language acquisition research for language teaching? In V. Cook & D. Singleton (Eds.), Key topics in second language acquisition (pp. 109–24). Bristol, England: Multilingual Matters.
14 Snow, M. A., & Brinton, D. (Eds.). (2017). The content‐based classroom: New perspectives on integrating language and content. Ann Arbor: University of Michigan Press.
15 Wagner, E. (2014). Using unscripted spoken texts in the teaching of second language listening. TESOL Journal, 5(2), 288–311.
16 Widdowson, H. (1998). Context, community and authentic language. TESOL Quarterly, 32(4), 705–16.
17 Zyzik, E., & Polio, C. (2017). Authentic materials myths: Applying second language research to classroom teaching. Ann Arbor: University of Michigan Press.
Suggested Readings
1 Maley, A., & Tomlinson, B. (Eds.). (2017). Authenticity in materials development for language learning. Newcastle upon Tyne, England: Cambridge Scholars Publishing.
2 Mishan, F. (2005). Designing authenticity into language learning materials. Bristol, England: Intellect.
3 Rilling, S., & Dantas‐Whitney, M. (Eds.). (2009). Authenticity in the language classroom and beyond: Adult learners. Alexandra, VA: TESOL.
4 Wagner, E., & Ockey, G. J. (2018). An overview of the use of authentic, real‐world spoken texts on L2 listening tests. In G. J. Ockey & E. Wagner (Eds.), Assessing L2 listening: Moving towards authenticity (pp. 13–28). Amsterdam, Netherlands: John Benjamins.
Note
1 Based in part on S. McKay (2012). Authenticity in the language teaching curriculum. In C. A. Chapelle (Ed.), The Encyclopedia of Applied Linguistics. John Wiley & Sons Inc., with permission.
Automatic Speech Recognition
JOHN LEVIS AND RUSLAN SUVOROV
Definition
Automatic speech recognition (ASR) is an independent, machine‐based process of decoding and transcribing oral speech. A typical ASR system receives acoustic input from a speaker through a microphone; analyzes it using some pattern, model, or algorithm; and produces an output, usually in the form of a text (Lai, Karat, & Yankelovich, 2008).
It is important to distinguish speech recognition from speech understanding, the latter being the process of determining the meaning of an utterance rather than its transcription. Speech recognition is also different from voice (or speaker) recognition: Whereas speech recognition refers to the ability of a machine to recognize the words and phrases that are spoken (i.e., what is being said), speaker (or voice) recognition involves the ability of a machine to recognize the person who is speaking.
Historical Overview
Pioneering work on ASR dates to the early 1950s. The first ASR system, developed at Bell Telephone Laboratories by Davis, Biddulph, and Balashek (1952), could recognize isolated digits from 0 to 9 for a single speaker. In 1956, Olson and Belar created a phonetic typewriter that could recognize 10 discrete syllables. It was also speaker dependent and required extensive training.
These early ASR systems used template‐based recognition based on pattern matching that compared the speaker's input with prestored acoustic templates or patterns. Pattern matching operates well at the word level for recognition of phonetically distinct items in small vocabularies but is less effective for larger vocabulary recognition. Another limitation of pattern matching is its inability to match and align input speech signals with prestored acoustic models of different lengths. Therefore, the performance of these ASR systems was lackluster because they used acoustic approaches that only recognized basic units of speech clearly enunciated by a single speaker (Rabiner & Juang, 1993).
An early attempt to construct speaker‐independent recognizers by Forgie and Forgie (1959) was also the first to use a computer. Later, researchers experimented with time‐normalization techniques (such as dynamic time warping, or DTW) to minimize differences in speech rates of different talkers and to reliably detect speech starts and ends (e.g., Martin, Nelson, & Zadell, 1964; Vintsyuk, 1968). Reddy (1966) attempted to develop a system capable of recognizing continuous speech by dynamically tracking phonemes.
Figure 1 A simple four‐state Markov model with transition probabilities
The 1970s were marked by several milestones: focus on the recognition of continuous speech, development of large vocabulary speech recognizers, and experiments to create truly speaker‐independent systems. During this period, the first commercial ASR system, called VIP‐100, appeared and won a US National Award. This success triggered the Advanced Research Projects Agency (ARPA) of the US Department of Defense to fund the Speech Understanding Research (SUR) project from 1971 to 1976 (Markowitz, 1996). The goal of SUR was to create a system capable of understanding the connected speech of several speakers from a 1,000‐word vocabulary in a low‐noise environment with an error rate of less than 10%. Of six systems, the most viable were Hearsay II, HWIM (hear what I mean), and Harpy, the only system that completely achieved SUR's goal (Rodman, 1999). The systems had a profound impact on ASR research and development by demonstrating the benefits of data‐driven statistical models over template‐based approaches and helping move ASR research toward statistical modeling methods such as hidden Markov modeling (HMM). Unlike pattern matching, HMM is based on complex statistical and probabilistic analyses (Peinado & Segura, 2006). In simple terms, HMMs represent language units (e.g., phonemes or words) as a sequence of states with transition probabilities between each state (see Figure 1).
The main strength of an HMM is that it can describe the probability of states and represent their order and variability through matching techniques such as the Baum‐Welch or Viterbi algorithms. In other words, HMMs can adequately analyze both the temporal and spectral variations of speech signals, and can recognize and efficiently decode continuous speech input. However, HMMs require extensive training and huge computational power for model‐parameter storage and likelihood evaluation (Burileanu, 2008).
Although