In other words, to compute ∂z/∂X, we do not need to explicitly use the extremely high‐dimensional matrix M. Instead, Eqs. (3.102) and (3.84) can be used to efficiently find it. The convolution example from Figure 3.23 is used to illustrate the inverse mapping m−1 in Figure 3.25.
In the right half of Figure 3.25, the 6 × 4 matrix is ∂z/∂Y)FT. In order to compute the partial derivative of z with respect to one element in the input X, we need to find which elements in ∂z/∂Y)FT are involved and add them. In the left half of Figure 3.25, we see that the input element 5 (shown in larger font) is involved in four convolution operations, shown by the gray, light gray, dotted gray and black boxes, respectively. These four convolution operations correspond to p = 1, 2, 3, 4. For example, when p = 2 (the light gray box), 5 is the third element in the convolution, and hence q = 3 when p = 2, and we put a light gray circle in the (2, 3)‐th element of the (∂z/∂Y)FTmatrix. After all four circles are put in the matrix (∂z/∂Y)FT,the partial derivative is the sum of ellements in these four locations of (∂z/∂Y)FT. The set m−1(il, jl, dl) contains at most HWDl elements. Hence, Eq. (3.102) requires at most HWDl summations to compute one element of ∂z/∂X.
The pooling layer: Let
Figure 3.25 Computing ∂z/∂X. (for more details see the color figure in the bins).
Figure 3.26 Illustration of pooling layer operation. (for more details see the color figure in the bins).
Formally this can be represented as
(3.103)
where 0 ≤ il + 1 < Hl + 1, 0 ≤ jl + 1 < Wl + 1, and 0 ≤ d < Dl + 1 = Dl.
Pooling is a local operator, and its forward computation is straightforward. When focusing on the backpropagation, only max pooling will be discussed and we can resort to the indicator matrix again. All we need to encode in this indicator matrix is: for every element in y, where does it come from in xl?
We need a triplet (il, jl, dl) to locate one element in the input xl, and another triplet (il + 1, jl + 1, dl + 1) to locate one element in y. The pooling output
where ⌊·⌋ is the floor function. If the stride is not H(W) in the vertical (horizontal) direction, the equation must be changed accordingly. Given a (il + 1, jl + 1, dl + 1) triplet, there is only one (il, jl, dl) triplet that satisfies all these conditions. So, we define an indicator matrix