with leading to the weight update
.
Parameters δ are derived recursively starting from the output layer:
where f ′ is the derivative of the sigmoid function of s. We have also used for the output layer
. With this, at the output layer, each neuron has an explicit desired response, so we can write
(3.10)
Substituting into Eq. (3.9) yields .
To calculate the δ′ s, we note that eTe is influenced through indirectly through all node values
in the next layer. Referring to the upper part of Figure 3.3, we again employ the chain rule
(3.11)
with
(3.12)
Recalling that , we get
In summary, we have
(3.14)
For the bias weight we note that
in Eq. (3.13). The above processing is illustrated in Figure 3.4, indicating the symmetry between the forward propagation of neuron activation values and the backward propagation of δ terms.