The following equations fully describe the RNN from Figure 3.17
(3.65)
Figure 3.14 General locally recurrent–globally feedforward (LRGF) architecture.
Figure 3.15 An example of Elman recurrent neural network (RNN).
Figure 3.16 An example of Jordan recurrent neural network (RNN).
where the (p + N + 1) × 1 dimensional vector u comprises both the external and feedback inputs to a neuron, as well as the unity valued constant bias input.
Training: Here, we discuss training the single fully connected RNN shown in Figure 3.17. The nonlinear time series prediction uses only one output neuron of the RNN. Training of the RNN is based on minimizing the instantaneous squared error at the output of the first neuron of the RNN which can be expressed as
(3.66)
where e(k) denotes the error at the output y1 of the RNN, and s(k) is the training signal. Hence, the correction for the l‐th weight of neuron k at the time instant k is
(3.67)
Figure 3.17 A fully connected recurrent neural network (RNN; Williams–Zipser network) The neurons (nodes) are depicted by circles and incorporate the operation Φ (sum of inputs).
Since the external signal vector s does not depend on the elements of W, the error gradient becomes ∂e(k)/∂ωn,l(k) = − ∂y1(k)/∂ωn,l(k). Using the chain rule gives
(3.68)
where δnl = 1 if n = l and 0 otherwise. When the learning rate η is sufficiently small, we have ∂yα(k − 1)/∂ωn, l(k) ≈ ∂yα(k − 1)/∂ωn, l(k − 1). By introducing the notation
(3.69)