Figure 3.6 Finite impulse response (FIR) network unfolding.
Example
For the network shown in Figure 3.6, all connections are made by second‐order (three tap) FIRs. Although at first sight it looks as though we have only 10 connections in the network, in reality there are a total of 30 variable filter coefficients (not counting five bias weights). Starting at the output, each tap delay can be interpreted as a “virtual neuron,” whose input is delayed by the given number of time steps. A tap delay can be “removed” by replicating the previous layers of the network and delaying the input to the network as shown in Figure 3.6. The procedure is then carried on backward throughout each layer until all delays have been removed. The final unfolded structure is depicted in the bottom of Figure 3.6.
3.2.3 Adaptation
For supervised learning with input sequence x(k), the difference between the desired output at time k and the actual output of the network is the error
(3.17)
The total squared error over the sequence is given by
(3.18)
The objective of training is to determine the set of FIR filter coefficients (weights) that minimizes the cost J subject to the constraint of the network topology. A gradient descent approach will be utilized again in which the weights are iteratively updated.
For instantaneous gradient descent, FIR filters may be updated at each time slot as
(3.19)
where
Temporal backpropagation is an alternative approach that can be used to avoid the above problem. To discuss it, let us consider two alternative forms of the true gradient of the cost function:
(3.20)
Note that
only their sum over all k is equal. Based on this new expansion, each term in the sum is used to form the following stochastic algorithm:
For small learning rates, the total accumulated weight change is approximately equal to the true gradient. This training algorithm is termed temporal backpropagation.
To complete the algorithm, recall the summing junction is defined as
(3.22)