Artificial Intelligence and Quantum Computing for Advanced Wireless Networks. Savo G. Glisic. Читать онлайн. Newlib. NEWLIB.NET

Информация о произведении:

Автор:	Savo G. Glisic
Издательство:	John Wiley & Sons Limited
Серия:
Жанр произведения:	Программы
Год издания:	0
isbn:	9781119790310

Скачать книгу

rel="nofollow" href="#fb3_img_img_2e146fa8-7701-5897-9b55-42d5712c3824.png" alt="images"/> images

Here, y ∈ R^a is the input vector, and the row vector δ ∈ R^b is a signal that suggests how the network output must be adjusted to improve the cost function. In most applications, the cost function is e_w(y) = (t − y)² and δ=(∂e_w/∂o)(y) = 2(t − o), where o =l_w(y) and t (target) is the vector of the desired output corresponding to input y. On the other hand, δ(∂l_w/∂y)(y) is the gradient of e_w with respect to the network input and is easily computed as a side product of backpropagation. Backpropagation computes for each neuron v the delta value (∂e_w/∂a_v)(y) = δ(∂l_w/∂a_v)(y), where e_w is the cost function and a_v the activation level of neuron v. Thus, δ(∂l_w/∂y)(y) is just a vector stacking all the delta values of the input neurons. Finally, images denote the computational complexity required by the application of FP and BP on l_w , respectively. For example, if l_w is implemented by a multilayered FNN with a inputs, b hidden neurons, and c outputs, then images holds.

Complexity of Instructions

1 Instructions z(t + 1) = z(t) · A + b, 0 = Gw(x, lN), and x(t + 1) = Fw(x(t), l): Since A is a matrix having at most s2 ∣ E∣ nonnull elements, the multiplication of z(t) by A, and as a consequence, the instruction z(t + 1) = z(t) · A + b, costs O(s2 ∣ E∣) floating points operations. The state x (t+1) and the output vector o are calculated by applying the local transition function and the local output function to each node n. Thus, in positional GNNs and in nonlinear GNNs, where fw, hw , and gw are directly implemented by FNNs, x(t + 1) and o are computed by running the forward phase of backpropagation once for each node or edge (see Table 5.2). On the other hand, in linear GNNs, xn(t) is calculated in two steps: the matrices An of Eq. (5.82) and the vectors bn of Eq. (5.83) are evaluated; then, x(t) is computed. The former phase, the cost of which is , is executed once for each epoch, whereas the latter phase, the cost of which is O(s2 ∣ E∣), is executed at every step of the cycle in the function FORWARD.

2 Instruction =(∂Fw/∂x)(x, l): This instruction requires the computation of the Jacobian of Fw. Note that A = {An,u} is a block matrix where the block An,u measures the effect of node u on node n if there is an arc (n,u) from u to n, and is null otherwise. In the linear model, the matrices An,u correspond to those displayed in Eq. (5.82) and are used to calculate x(t) in the forward phase. Thus, such an instruction has no cost in the backward phase in linear GNNs.In nonlinear GNNs, An,u = (∂hw/∂xn)(ln, l(n,u), xu, lu) is computed by appropriately exploiting the backpropagation procedure. In other words, let qi ∈ Rs be a vector where all the components are zero except for the i‐th one, which equals one, that is, q1 = [1, 0, …, 0], q2 = [0, 1, 0, …, 0], and so on. Note that BP, when it is applied to lw with δ = bi, returns , that is, the i‐th column of the Jacobian (∂lw/∂y)(y). Thus, An, u can be computed by applying BP on all the qi, that is,(5.84)

where BP₂ indicates that we are considering only the first component of the output of BP. A similar reasoning can also be used with positional GNNs. The complexity of these procedures is easily derived and is displayed in the fourth row of Table 5.2.

1 Computation of ∂ew/∂o and ∂pw/∂w: In linear GNNs, the cost function is , and as a consequence, if nk is a node belonging to the training set, and 0 otherwise. Thus, ∂ew/∂o is easily calculated by O(∣N∣) operations.

In positional and nonlinear GNNs, a penalty term p_w is added to the cost function to force the transition function to be a contraction map. In this case, it is necessary to compute ∂p_w/∂w, because such a vector must be added to the gradient. Let images denote the element in position i,j of the block A_{n, u} . Recalling the definition of p_w , we have

where images if the sum is larger than 0, and 0 otherwise. It follows that

where sgn is the sign function. Let R_{n, u} be a matrix whose element in position i,j is images , and let vec be the operator that takes a matrix and produces a column vector by stacking all its columns one on top of the other. Then

(5.85) equation

holds. The vector ∂vec(A_{n, u})/∂w depends on selected implementation of h_w or f_w . For the sake of simplicity, let us restrict our attention to nonlinear GNNs and assume that the transition network is a three‐layered FNN. σ_j, a_j, V_j , and t_j are the activation function, the vector of the activation levels, the matrix of the weights, and the thresholds of the j‐th layer, respectively. σ_j is a vectorial function that takes as input the vector of the activation levels of neurons in a layer and returns the vector of the outputs of the neurons of the same layer. The following reasoning can also be extended to positional GNNs and networks with a different number of layers. The function h_w is formally defined in terms of σ_j, a_j, V_j , and t_j

By the chain differentiation rule, we get

where images is the derivative of σ_j , diag is an operator that transforms a vector into a diagonal matrix having such a vector as diagonal, and images is the submatrix of V₁ that contains

Скачать книгу