4.3 Measurement of Intelligibility
Intelligibility is challenging to measure, in part because of its complexity. A considerable body of research has demonstrated that there are many variables that may influence intelligibility, as noted above. Operationally, there are two main approaches to measuring intelligibility: objective measures and subjective measures.
4.3.1 Objective Measures of Intelligibility
Objective measures of intelligibility can be obtained and scored in numerous ways, but the primary commonality is that a specific speech sample is directly analyzed to yield a quantitative score. Objective measures often involve transcription of speech, using traditional orthography, broad phonetic transcription, narrow phonetic transcription, or forced‐choice recognition of target items. These approaches typically yield a percentage of items identified correctly relative to the targets that the speaker intended to produce (Tikofsky & Tikofsky, 1964; Yorkston & Beukelman, 1978, 1980). Objective measures have been considered the “gold standard” for measuring intelligibility clinically because quantification is straightforward: units are either correct or incorrect. However, in order to score items as correct or incorrect, targets produced by the speaker must be known so that they can be scored accordingly. For this reason, elicited words and sentences are typically used for measuring intelligibility via transcription or forced‐choice recognition approaches. Standard clinical tools such as the Sentence Intelligibility Test (SIT; Yorkston, Beukelman, & Tice, 1996), the Assessment of Intelligibility of Dysarthric Speech (Yorkston, Beukelman, & Traynor, 1984), and the Test of Children’s Speech (TOCS; Hodge & Daniels, 2007) employ this type of transcription method. Of particular note is that typically listeners are unfamiliar with the speaker and the stimulus material and are thus “naïve,” potentially representing an unfamiliar listener that a speaker may encounter in daily life. However, there are several aspects of intelligibility measurement that are less ecologically representative. For example, speakers are recorded in a quiet environment, and listening tasks also usually take place in a quiet environment, which may be unlike many speaking situations. Indeed, background noise has been shown to further reduce intelligibility for listeners transcribing dysarthric speech (Yoho & Borrie, 2018). In addition, the language produced by speakers is not spontaneously generated in elicitation or recitation tasks, thus any interactions between language formulation, language production, and speech motor variables (for the better or for the worse) may not be reflected in elicited intelligibility measures. Finally, the speaker and listener do not have the opportunity to interact, making intelligibility contrived and without a real communicative purpose.
There is a body of evidence showing that adult speakers without communication disorders entrain their speech production behaviors to one another during conversation, essentially becoming more acoustically and perceptually similar (Borrie, Barrett, Willi, & Berisha, 2019; Giles & Powesland, 1975; Pardo, 2006). These interdependent adjustments to speech production behavior occur at a seemingly unconscious level during spoken dialog and are considered to reduce the computational load of spoken language processing and improve the effectiveness and efficiency with which information is exchanged. While this is a new area of investigation for individuals with intelligibility impairment, work has begun to examine entrainment of speech behaviors in the conversations that occur between individuals with dysarthria and adults without communication disorders. Preliminary evidence suggests that, while substantially reduced relative to the entrainment that occurs between two adults without communication disorders, entrainment of some speech behaviors may transpire in conversations with this clinical population (Borrie, Barrett, Liss, & Berisha, 2020; Borrie, Lubold, & Pon‐Barry, 2015). While the link between entrainment and traditional measures of intelligibility has not yet received attention, entrainment of speech behavior, even in conversations with individuals with dysarthria, has been linked with objective measures of improved communicative efficiency.
The quantification of intelligibility of spontaneous speech, whether in conversational or narrative discourse, is clearly desirable because of its ecological validity. However, it is difficult, if not impossible, to score orthographic transcriptions of spontaneous speech when the number and nature of lexical targets are not definitively known. Nonetheless, the literature on children’s speech and language has reported intelligibility of spontaneous speech using speech/language samples that have been transcribed for language sample analysis. For example, the number of complete and intelligible utterances divided by the total number of utterances in a transcript has been reported as a percentage of intelligible utterances (Binger, Ragsdale, & Bustos, 2016; Rice et al., 2010; Yoder, Woynaroski, & Camarata, 2016). Similarly, Flipsen (2006) proposed methods for quantifying intelligibility of conversational speech using speech samples that were transcribed for the evaluation of phonetic development using narrow phonetic transcription by expert transcribers. Although estimates of intelligibility of spontaneous speech obtained from language sample analysis or from phonetic analysis are ecologically valid, measurement procedures have some critical limitations that may inflate estimates of intelligibility. First, typically only one individual transcribes a child’s speech, and this person is an expert in child speech and/or language and is thus not representative of an everyday communication partner that a child might encounter. Second, transcribers are usually allowed to play back recorded speech samples multiple times—a convenience unavailable in real‐life listening situations. Third, there is typically a communication partner (clinician or parent) who interacts with the child during a speech and language sample who provides considerable contextual information and who may even gloss the child’s utterances, which aids the transcriber in making sense of the speech signal. Finally, in a spontaneous speech sample context, the speaker’s intended message is not definitively known a priori because it is spontaneously generated and the content of the speech/language sample is accepted as accurate if the transcriber assigned words (possibly the wrong words) to the spoken message. For these reasons, intelligibility measures obtained from speech and language samples may provide an inflated estimate of intelligibility.
One alternative that has been used in the literature is a hybrid approach, combining elements of language sample analysis, the procedures described by Flipsen (2006), and transcription intelligibility of elicited utterances. Hodge and Gotzke (2014) measured intelligibility of spontaneously generated speech by having experts create a master transcript against which unfamiliar listener responses could be scored. They then employed unfamiliar listeners who completed orthographic transcription tasks like those described for elicited words and sentences, above, to yield a percentage intelligibility score (Hodge & Gotzke, 2014). Findings indicate that intelligibility of elicited sentences from the TOCS did not differ from intelligibility of spontaneous speech samples. This convergence of findings may be related to similarities in methods, including use of unfamiliar listeners and use of a listening task that was constrained and decontextualized. Although this type of hybrid approach may not be fully reflective of the rich context available in dynamic interaction between speaker and listener, results provide important construct validity for the use of measures such as the TOCS for understanding intelligibility in children.
4.3.2 Subjective Measures of Intelligibility
The second main approach to measurement of speech intelligibility involves subjective measures. Subjective measures of intelligibility generally require listeners to quantify their perception of a speaker’s intelligibility by assigning a number to, or scaling, what they heard (Weismer & Laures, 2002). Direct magnitude estimation (DME) procedures have been used frequently for the study of contributors to intelligibility in dysarthria. DME procedures require listeners to scale the intelligibility of speech relative to a modulus or exemplar. In contrast, use of Likert ratings, or equal appearing interval scales, requires listeners to assign numbers based on perceived similarity