Other important factors of oral communication known to affect listening comprehension include prosody (Lynch, 1998), phonology (Henricksen, 1984), and hesitations (Freedle & Kostin, 1999). Brindley and Slatyer (2002) also identify length, syntax, vocabulary, discourse, and redundancy of the input as important variables.
Types of interaction and relationships among speakers are also important factors to take into account when designing listening assessment inputs. Monologues, dialogues, and discussions among a group of people are all types of interaction that one would be likely to encounter in real‐world listening tasks. Individuals might also expect to listen to input with various levels of formality, depending on the relationship between the speaker and the listener.
Tasks for Assessing Listening
Decisions about the characteristics of the desired listening assessment tasks should be based on the purposes of the test, the test takers' personal characteristics, and the construct that the test is designed to measure (Bachman & Palmer, 2010). Buck (2001) provided the following guidelines concerning listening tasks, which may be applicable to most listening test contexts: (a) listening test input should include typical realistic spoken language, commonly used grammatical knowledge, and some long texts; (b) some questions should require understanding of inferred meaning (as well as global understanding and comprehension of specific details) and all questions should assess linguistic knowledge—not that which is dependent on general cognitive abilities; and (c) test takers' background knowledge on the content to be comprehended should be similar. The message conveyed by the input, not the exact vocabulary or grammar used to transmit the message, as well as various types of interaction and levels of formality should also be assessed.
In practice, listening assessment tasks require learners to listen to input and then provide evidence of comprehension by responding to questions about the information conveyed in the input. The most common types of comprehension questions are selected response items, including multiple‐choice, true/false, and matching. For these item types, test takers are required to select the most appropriate answer from options which are provided. These options may be based on words, phrases, objects, pictures, or other realia. Selected response items are popular, in part, because they can be scored quickly and objectively. An important question to answer when designing selected response item types is whether or not to provide people with the questions and possible responses prior to the input, especially since including them has been shown to favor more proficient test takers (Wu, 1998) and certain item types are affected differentially by the inclusion of item stems or answer options, or both (Koyama, Sun, & Ockey, 2016).
Constructed response item types are also commonly used. They require test takers to create their own response to a comprehension question and have become increasingly popular. These item types require short or long answers, and include summaries and completion of organizational charts, graphs, or figures. One item type that has received increasing attention is an integrated listen–speak item. Test takers listen to an oral input and then summarize or discuss the content of what they have heard (Ockey & Wagner, 2018). Constructed response item types have been shown to be more difficult for test takers than selected response item types (In'nami & Koizumi, 2009) and may therefore be more appropriate for more proficient learners. Most test developers and users have avoided using constructed response item types because scoring can be less reliable and require more resources. Recent developments in computer technology, however, have made scoring productive item types increasingly more reliable and practical (Carr, 2014), which may lead to their increased use.
Another listening task used in tests today is sentence repetition, which requires test takers to orally repeat what they hear, or the analogous written task of dictation, which requires people to write what they hear. As with constructed response items, computer technology has made the scoring of sentence repetition and dictation objective and practical. Translation tasks, which require test takers to translate what they hear in the target language into their first language, is also a popular task used for assessing listening, especially when everyone who is assessed has the same first language.
Limitations of Listening Assessment Tasks
Validly assessing second language listening comprehension presents a number of challenges. The process of listening comprehension is not completely understood, and there are currently no methods which allow a view of the listener's brain to see what has been comprehended. Instead the listener must indicate what has been understood. The medium of this indication, along with other factors, has potential for diminishing the validity of listening assessments.
The majority of listening tasks require test takers to select responses from given choices or to use speaking, reading, or writing skills to demonstrate comprehension of the input. For instance, most forms of multiple‐choice, true/false, matching, short‐answer and long‐answer items require test takers to read the questions and make a selection or provide a written response, while other tasks—such as sentence repetition—require oral responses. The need for learners to use other language skills when their listening is assessed can lead to scores that are not representative of their listening abilities in isolation, such as when watching a movie.
Strategies and other abilities not generally defined as part of a listening comprehension construct may also lead test takers to achieve scores that are not representative of their listening abilities. For instance, some learners may be able to eliminate wrong answer options or even select the correct answer by using test‐taking strategies, such as selecting the longest answer to increase their chances of getting an answer to a multiple‐choice item correct. Another problem with sentence repetition and dictation tasks is that people with well‐developed sound recognition skills may be able to repeat the sounds they hear or write letters that correspond to the words they hear without comprehending the information.
Scores on listening assessments are compromised in various ways depending on the tasks that test developers choose to use. Therefore, listening assessment developers and users should take into account the abilities of the test takers and limitations of the tasks used to best ensure that the test provides a valid indication of learners' listening abilities.
Computers in Assessing Listening
Developments in computer technology expand the potential for different types of listening input and ways of determining comprehension of input. For instance, technology allows acoustic signals to be easily accompanied by various types of input such as visual stimuli, which can make tasks more realistic.
Computers also increase the potential for using test items that require short constructed responses. Developers create model answers and identify the key words and phrases in them, and then these key words and phrases, along with acceptable synonyms, can be used to create a scoring algorithm. Reponses to items that contain part or all of the targeted information can be given partial or full credit (Carr, 2014).
Computer technology may, however, have a negative effect on the assessment of listening. The tendency to use technology because it is available and attractive may lead to assessments that do not validly measure listening comprehension (Ockey, 2009). For instance, including attractively colored interfaces or interesting sounds, which are not part of the message to be comprehended, may be distracting to some test takers. Such distractions could lead to invalid scores for affected individuals. Moreover, computer scoring systems cannot make human judgments that may be important when assessing language (Condon, 2006). For instance, in a summary