When a sound wave reaches the ear, it travels down the auditory canal until it reaches the eardrum. It sets the eardrum in motion and this vibration is transmitted across the 2 mm gap to the oval window by the lever system comprised of the auditory ossicles. It is thought that this mechanical system is an impedance matching device. The characteristic impedance, ρc, of air is approximately one two‐thousandth of the impedance of the cochlea fluid. The area of the tympanic membrane is 20 or 30 times larger than that of the oval window. Some believe that it is not the area of the oval window which is important, but rather the area of the footplate of the stapes. The tympanic membrane is about 20 times greater in area than the footplate area. However, not all of the tympanic membrane vibrates because it is firmly attached at its periphery. The ratio of the part of the tympanic membrane that moves to the footplate area is about 14 to 1.
Also, the pivot of the ossicle system may be assumed closer to the oval window than the eardrum, hence providing a mechanical advantage of two or three times. The net result is that low‐pressure, high particle velocity amplitude air waves arriving at the eardrum are converted into high‐pressure low particle velocity amplitude fluid waves in the cochlea, approximately matching the air to fluid impedances. We probably remember from electrical theory that in order to obtain maximum power transfer, impedances must be matched.
There is a “safety” device built into the inner ear mechanism. Attached to the malleus and stapes are two muscles: the tensor tympani and the stapedius. If continuous intense sounds are experienced, the muscles contract and rotate the ossicles so that the force passed onto the oval window does not increase correspondingly to the sound pressure.
This effect is called the acoustic reflex and many types of experiments indicate that the reflex attenuates low‐frequency sound levels up to about 20 dB [9]. However, these muscles are rapidly fatigued by continuous narrow‐band intense noise. In addition, the muscles are relatively slow in their contraction making the reflex ineffective in presence of impulse or impact sounds. There seems to be some evidence that the muscles also contract when we speak, to prevent us hearing so much of our own speech.
The Eustachian tube is used to equalize the pressure across the eardrum by swallowing. This explains why our ears “pop” in airplanes when we ascend and the atmospheric pressure changes. We may experience some pain in an airplane when landing again if we have a cold; mucus, blocking the Eustachian tube, can prevent us from equalizing the pressure by swallowing. Movement of the footplate of the stapes which is connected to the oval window causes pressure waves in the fluid of the upper gallery of the cochlea (Figures 4.3 and 4.4). The fluid in the lower gallery is separated from that in the upper gallery by the cochlear duct containing the organ of Corti. The organ has about 35 000 sensitive hair cells distributed along its length which are connected in a complicated way to about 18 000 nerve fibers which are combined into the auditory nerve which runs into the brain. The pressure waves cause the basilar membrane to deflect and a shearing motion occurs between the basilar and tectorial membranes. The hair cells sense the shearing motion and if the stimulus is great enough the neuron to which each hair cell is attached sends an impulse along the nerve fiber to the brain cortex [8]. Each neuron takes about 1/1000th of a second to recharge and so individual neurons are limited to “firing” no more than 1000 times/second. With the neurons, a triggering level must be reached before they “fire” and so they have an all‐or‐nothing response. The brain must interpret the neural impulses to give us the sensation of hearing and, as we can imagine, the way in which this is done is not well understood.
4.2.3 Theories of Hearing
Pythagoras in the sixth century BCE was perhaps the first to recognize that sound is an airborne vibration [10]. Hippocrates in the fourth century BCE recognized that the air vibrations are picked up by the eardrum but thought that the vibrations were transmitted directly to the brain by bones. In 175 CE, Galen of Pergamum, a Greek physician, realized that it was nerves that transmitted the sound sensations to the brain. Galen and most other early scientists and philosophers proposed, mistakenly, however, that somewhere deep in the head was a sealed pocket of implanted air which was the “seat” of hearing. This view was popularly held until 1760 when Domenico Cotugno declared that the inner ear (cochlea) was completely filled with fluid [10].
In 1543, Andreas Vesalius published his treatise on anatomy giving a description of the middle ear and in 1561 Gabriello Fallopio described the cochlea itself.
In 1605 Gaspard Bauhin put forward a resonance theory for the ear. In his model, different air cavities were excited by different frequencies. However, he knew little of the construction of the inner ear. Du Verney, in 1633, developed a more advanced theory by postulating that different parts of the ridge of bone which twists up the inside of the cochlea resonated at different frequencies which depended upon its width. Du Verney's theory was held until 1851 when Alfonso Corti, using a microscope, discovered that the thousands of hair cells on the basilar membrane were attached to the ridge of bone in the cochlea.
A few years later, Hermann von Helmholtz used Corti's findings to suggest a new theory of hearing. In Helmholtz's theory, as it became refined, different parts of the basilar membrane resonated at different frequencies. Later workers showed that Helmholtz was not exactly right (the basilar membrane is not under tension). However, in 1928 Georg von Békésy did show that waves do travel along the basilar membrane and different sections of the basilar membrane do respond more than others to a certain sound. The region of maximum response is frequency‐dependent and as Helmholtz had predicted, von Békésy found that the high‐frequency sound is detected nearer to the oval window and the low‐frequency sound, nearer to the apex (Figures 4.3 and 4.4).
4.3 Subjective Response
So far we have traced the sound signal down the ear canal to the eardrum, through the auditory ossicles, through the oval window to the cochlear fluid to the basilar membrane and the hair cells, and finally to the neural impulses sent to the brain. How does the brain interpret these signals? Our study now enters the realm of psychology. While the physicist or engineer talks about sound pressure level and frequency, the psychologist talks about loudness and pitch, respectively. The human auditory response to sound is studied by psychoacoustics. In Section 4.3 we shall discuss the relationships between some of the engineering descriptions of sound and the psychological or subjective descriptions of psychoacoustics.
4.3.1 Hearing Envelope
Figure 4.5 presents the auditory field for an average, normal young person who has not suffered any hearing loss or damage. The lower curve represents the hearing threshold, that is, the quietest audible sound at any frequency. The upper curve represents the discomfort threshold, that is, the sound pressure level at any frequency at which there is a sensation of discomfort and even pain in the ears. Speech is mainly in the frequency range of about 250–6000 Hz and at sound pressure levels between about 30–80 dB at 1–2 m (depending upon frequency). Of course, the sound pressure level of speech can approach 90 dB at about 0.2–0.3 m from someone if they are shouting loudly. The sound of vowels is mostly in the low‐frequency range from about 250 to 1000 Hz, while the sound of consonants is mainly in the higher frequency range of about 1000–6000 Hz. Music is spread over a somewhat greater frequency range and a greater dynamic range than speech. (The dynamic range represents the difference in levels between the lowest and highest sound pressure levels experienced.)