Moberly, Lowenstein, and Nittrouer (2016) tested a variety of tasks including “perceptual sensitivity” (labeling “cop” vs. “cob” and “sa” vs. “sha” based on durational or spectral cues), “perceptual attention” (discriminating quasi‐syllabic sinusoidal glides based on duration or “spectral” cues), and word recognition (CID [Central Institute for the Deaf]‐22 word lists; Hirsh et al. 1952) in 30 postlingually deafened adult CI recipients and 20 NH controls. The two groups had similar perceptual sensitivity and attention for duration cues, but the CI group had less sensitivity for spectral cues. Word recognition varied between 20 and 96% correct, with a mean accuracy of 66.5% for their clinical group, whereas the task posed little perceptual challenge to the NH group, as their mean accuracy was 97.1%. These word recognition scores were predicted by spectral cue sensitivity and attention, suggesting that speech perception deficits at the phonetic, that is the sub‐segmental, level affect those at the word level.
In a gated word recognition study, Patro and Mendel (2018) found that CI users needed on average around 35% more speech information to recognize words than NH controls, and NH participants listening to vocoded speech needed approximately 25% more information. The fact that these two groups had relatively low performance suggests that CI users' disadvantage is, at least in part, due to spectrotemporal signal degradation caused by the electrical–neuronal perceptual bottleneck, and not merely to extra‐auditory factors such demographic group characteristics. When contextual information was provided by inserting the target words either in semantically relevant (e.g., “Paul took a bath in the TUB”) or semantically neutral (e.g., “Smith knows about the TUB”) sentences, words were recognized more easily (cf. Holt, Yuen, & Demuth, 2017). Moreover, CI users benefited more from this top‐down information than the controls did. This shows that signal degradation affects word recognition and that CI users will often rely more on contextual cues.
A number of conclusions can be drawn from the studies that are summarized above. Firstly, CI users can attain a high level of word recognition. Secondly, there is a great amount of individual variation in these recognition abilities, for which a wide range of factors may be responsible. Thirdly, there is evidence that lower‐level problems stemming from signal degradation are partly responsible for higher‐level problems in speech perception. Finally, top‐down information supports speech perception, improving CI listeners’ understanding of speech by allowing partial compensation for the spectral degradation that use of their implant necessarily entails.
3.3.5 Prosody Perception
Another informational layer of speech that is affected by spectral degradation is the perception of prosody. The functions of prosody in language are threefold. First of all, prosody signals the meaning or morphological and syntactic structure of several levels of linguistic elements, such as words, sentences, and larger units of discourse. This is commonly referred to as linguistic prosody. For instance, it distinguishes certain segmentally identical words, such as REcord vs. reCORD (lexical stress) and marks word grouping (phrasing), such as blue bottle vs. bluebottle, and given as opposed to new information (topic vs. focus), such as My COLleague was supposed to do this, as opposed to, My colleague was supposed to do THIS, where capitals indicate sentential accents. Secondly, prosody reflects the emotional state of the speaker or their attitude in relation to their utterance, and this attribute of speech is termed emotional prosody. For example, any utterance, in principle, may be pronounced in a sad, happy, angry or fearful way, or with any other emotion. Attitudes such as surprise, irony and sarcasm may also be employed to signal a speaker’s stance with regards to the truthfulness of the utterance. Finally, indexical prosody is suprasegmental information about the identity of the speaker, such as age, health, and provenance (Lehiste, 1970; Rietveld & van Heuven, 2016). Prosody is mainly conveyed by means of variation in intensity (stress), voice fundamental pitch variation (F0, also referred to as intonation), duration (of any linguistic unit as well as pauses in speech) and voice quality, for example, harshness, strain and creakiness (Rietveld & van Heuven, 2016). The current discussion will focus on the ability of CI listeners to perceive linguistic and emotional prosody.
An important investigation of linguistic prosody was performed by Meister et al. (2015). They presented CI users and NH controls with increments of manipulated F0, intensity and duration cues for word stress. They also measured just‐noticeable difference discrimination thresholds of participants for these phonetic dimensions. The researchers showed that the clinical group’s performance was compromised by the F0 and intensity cue manipulations, but not by manipulation of the duration cue, suggesting that they relied more on duration than on the other cues. A similar pattern was observed in the discrimination thresholds reported in the study, which were least elevated for CI in comparison to NH listeners for duration (51 ms for CI; and 40 ms for NH), more elevated for intensity (3.9 dB for CI; and 1.8 dB for NH), and most elevated for F0 (5.8 semitones for CI; and 1.5 semitones for NH) (cf. Kalathottukaren, Purdy, & Ballard, 2015; See, Driscoll, Gfeller, Kliethermes, & Oleson, 2013). O’Halpin (2009) found that school‐aged children with CIs were outperformed by their NH peers on phrase/compound word discrimination (blue bottle vs. bluebottle) and identification of two‐way (It’s a BLUE book vs. It’s a blue BOOK) and three‐way sentence accent positions (The BOY is painting a boat vs. The boy is PAINTING a boat vs. The boy is painting a BOAT). Furthermore, the CI children had larger discrimination thresholds for F0, and relatively smaller discrimination thresholds for intensity and duration, when tested with manipulated nonsense disyllables. These discrimination thresholds were correlated per cue with the scores on linguistic prosody, which indicates that prosody perception may be supported by psychophysical capabilities in CI children. The relationship between psychophysical and speech perceptual performance may not be as straightforward when it comes to postlingually deafened adults, as Morris, Magnusson, Faulkner, Jönsson, and Juul (2013) included discrimination thresholds for F0, intensity, duration and vowel quality, which was first formant, in a logistic regression analysis with prosody identification as the dependent variable. The prosodic tasks were vowel length, word stress, and phrase/compound word identification, and these were performed in quiet and also in a 10 dB SNR noise background. Only the discrimination threshold for intensity was found to be a significant variable, indicating that adult CI recipients who can better make use of intensity changes are also better at these types of linguistic‐prosodic tasks.
In a recent review on emotional prosody processing, Jiam, Caldwell, Deroche, Chatterjee, and Limb (2017) concluded that CI users have considerable difficulties perceiving and producing emotional prosody. For example, in Gilbers et al. (2015) on a four‐way emotion identification test using nonsense words, CI users scored around 45% correct, NH listeners using vocoders around 70%, and NH