Each term within the sum corresponds to a reverse FIR filter. This is illustrated in Figure 3.7. The filter is drawn in such a way to emphasize the reversal of signal propagation through the FIR. Representing the forward propagation of states and the backward propagation of error terms requires simply reversing the direction of signal flow. In this process, unit delay operators q−1 should be replaced with unit advances q+1. The complete adaptation algorithm can be summarized as follows:
The bias weight may again be adapted by letting in Eq. (3.33). Observe the similarities between these equations and those for standard backpropagation. In fact, by replacing the vectors a, w, and δ by scalars, the previous equations reduce to precisely the backpropagation algorithm for static networks. Differences in the temporal version are due to implicit time relations. To find , we filter the δ’s from the next layer backward through the FIR (see Figure 3.7). In other words, δ’s are created not only by taking weighted sums, but also by backward filtering. For each x(k) and desired vector d(k), the forward filters are incremented one time step, producing the current output y(k) and corresponding error e(k). Next, the backward filters are incremented one time step, advancing the δ(k) terms and allowing the filter coefficients to be updated. The process is then repeated for a new input at time k + 1.
The symmetry between the forward propagation of states and the backward propagation of error terms is preserved in temporal backpropagation. The number of operations per iteration now grows linearly with the number of layers and synapses in the network. This savings is due to the efficient recursive formulation. Each coefficient enters into the calculation only once, in contrast to the redundant use of terms when applying standard backpropagation to the unfolded network.
Design Example 3.1
As an illustration of the computations involved, we consider a simple network consisting of only two segments (cascaded linear FIR filters shown in Figure 3.8). The first segment is defined as
(3.35)
For simplicity, the second segment is limited to only three taps:
Here ( a is the vector of filter coefficient and should not be confused with the variable for the activation value used earlier). To adapt the filter coefficients, we evaluate the gradients ∂e2(k)/∂a and ∂e2(k)/∂b. For filter b, the desired response is available directly at the output of the filter of interest and the gradient is Скачать книгу