It has been demonstrated that problems with emotional prosody perception have repercussions for CI recipients’ social development, and this is, of course, a problem that is more pronounced in prelingually deafened CI recipients who have developed their receptive communication exclusively through their implants. Wiefferink, Rieffe, Ketelaar, De Raeve, and Frijns (2013) found that Dutch CI children, between 2.5 and 5 years old, performed worse on facial and situational emotion understanding and general expressive and receptive language development. The language test scores correlated with emotion tests that required verbal processing, suggesting that linguistic development can predict emotional development to some extent. Mancini et al. (2016) confirmed the connection between linguistic and emotional development for a group of 72 children aged between 4 and 11 years. However, 79% of their cohort showed no deviant understanding of emotion, which may be due to the fact that their participants were older and that a larger percentage of them used oral language exclusively.
In summary, CI listeners may to some extent, in their daily communication with the device, miss out on many if not most aspects of prosody. In prelingually deafened CI children this seems to be due to their more basic problems with pitch and spectral perception, although this causality is less clear for postlingually deafened CI recipients and the perception of emotional prosody by all CI recipients. Emotional prosody perception deficits in CI children have been shown to have consequences for more general socio‐emotional development.
3.3.6 Production of Segments
It is a plausible assumption that proper speech perception is required for proper speech production. For instance, people with congenital (Osberger & McGarr, 1982) and acquired (Waldstein, 1990) hearing loss tend to produce abnormal speech. The mechanism behind this link has been described in the Directions Into Velocity of Articulators (DIVA) model (Guenther, 2006). In short, during preparation for articulation, an abstract acoustic representation of the target is projected. This projection is compared with the acoustics of the actual projection. Deviations can inform corrections in production online and can also update the representations. The acoustic representations are learned on the basis of speech input accumulated during the speaker’s life. It is therefore to be expected that aberrant input, as in the case of individuals with hearing loss including those who have received a CI, results in aberrant representations and therefore abnormal speech output.
This expectation is corroborated by older as well as more recent work. For instance, in the area of segmental phonetics, CI recipients, in comparison to NH speakers, have been found to have inaccurate consonant (Sundarrajan, Tobey, Nicholas, & Geers, 2019) and vowel (Menard et al., 2007; Verhoeven, Hide, De Maeyer, Gillis, & Gillis, 2016) production; however, see also Baudonck, Van Lierde, Dhooge, and Corthals (2011). Sundarrajan et al. (2019) measured Percentage of Consonants Correct, a metric from Shriberg, Austin, Lewis, McSweeny, and Wilson (1997), and consonant diversity in the spontaneous speech of 129 CI children of around 4 years of age. At the age of 3.5 years, the implanted children’s inventory consisted of 14 consonants and that of the controls of 19, with 17 and 19 for the CI and NH groups, respectively, 1 year later. The CI recipients’ consonant accuracy improved from 71% at 3.5 years to 83% 1 year later, whereas it was 94 and 95% for the controls, at the same ages. Verhoeven et al. (2016) measured the vowel space and dispersion (overlap) in imitations of monosyllables containing the 12 monophthongs of Belgian, by children with conventional HA, children with CIs, and NH children. The HA and CI groups had a more centralized (more so for the second formant, F2, than for the first formant, F1) vowel space, and also more overlap between vowel categories than the control participants. These studies show that the diversity of segmental production in CI users on average is reduced relative to the norm.
3.3.7 Production of Prosody
Research on prosody production has shown that CI users, among other issues, have a higher F0 than controls (Poissant, Peters, & Robb, 2006; Szyfter et al., 1996), abnormal realization and usage of intonation (Chin, Bergeson, & Phan, 2012; Holt et al., 2017; Lee & Sim, 2020), a low speech rate (Evans & Deliyski, 2007; Uchanski & Geers, 2003), and a deviant voice quality (Baudonck et al., 2011; Monini, Banci, Barbara, Argiro, & Filipo, 1997; Ubrig et al., 2011). As an example of a study on linguistic prosody, in Lee and Sim (2020), final contours of imitated declarative and interrogative sentences were measured acoustically and evaluated for appropriateness by naïve raters. Children with CIs were found to distinguish the sentence types less clearly. Duration of implant usage was a significant predictor of these scores, showing that over time, CI users improve their prosodic control. In a comparable study (Peng, Tomblin, & Turner, 2008), such sentence‐type ratings were 74% correct when pronounced by CI users, and they achieved an appropriateness rating of 3.1 on a 5‐point scale, whereas with NH speakers the sentence type rating was 94% with an appropriateness of 4.5.
Research on emotional prosody production is sparser. A review by Jiam et al. (2017) points to studies on production of happy and sad vocal emotions by children and adults showing a lower accuracy in imitations (Nakata, Trehub, & Kanda, 2012) as well as in read sentences (Chatterjee et al., 2016). Comparable results have been reported by Bergeson and Chin (2008) and Wang, Trehub, Volkova, and van Lieshout (2013). Chatterjee et al. (2019), the preliminary results of which were presented in Chatterjee et al. (2016), longitudinally measured F0 mean and variability and mean intensity, duration and spectral centroid of happy and sad renderings of simple utterances in prelingually deafened children with CIs, postlingually deafened adults with CIs, and NH peers for both of those groups. The children with CIs showed the least amount of acoustic distinction between the emotions, mainly for F0‐related measures. The group of adult CI recipients were better able to contrast the emotions, which suggests, as the authors argue, that the acoustic input that they received prior to implantation supports normal vocal emotion expression. Moreover, the fact that they maintained the contrast after years of degraded (i.e., electric) hearing indicates that the sound representations that are developed and stored prior to implantation are very robust.
From the above discussion, a number of points are evident. Firstly, CI users have difficulties both with the perception and the production of linguistic and emotional prosody, especially when it comes to producing pitch‐related cues. Secondly, it must be reiterated that, despite these difficulties, in general, there is a large degree of individual variation in outcomes among CI users, with many showing speech production abilities that are commensurate with their NH peers. Finally, production skills depend, besides many other factors, upon demographic factors such as age at implantation, where older ages have a negative effect, and duration of device usage, where longer time periods have a positive effect.
The deviant speech of individuals with hearing loss show that in a general sense speech production quality depends on auditory feedback. A separate issue is to what extent the level of perception and production abilities are related to each other. Several studies show such a connection (Li, Lin, Yang, Chen, & Wu, 2018; Lyxell et al., 2009; O’Halpin, 2009; Peng, 2005),