where hw is a parametric function. This transition function, which has been successfully used in recursive NNs, is not affected by the positions and the number of the children. In the following, Eq. (5.76) is referred to as the nonpositional form, and Eq. (5.74) is called the positional form. To implement the GNN model, we need (i) a method to solve Eq. (5.74)); (ii) a learning algorithm to adapt fw and gw using examples from the training dataset; and (iii) an implementation of fw and gw. These steps will be considered in the following paragraphs.
Computation of the state: Banach’s fixed‐point theorem does not only ensure the existence and the uniqueness of the solution of Eq. (5.74), but it also suggests the following classic iterative scheme for computing the state:
where x(t) denotes the t‐th iteration of x. The dynamical system (Eq. (5.77)) converges exponentially fast to the solution of Eq. (5.75) for any initial value (0). We can, therefore, think of x(t) as the state that is updated by the transition function Fw . In fact, Eq. (5.77) implements the Jacobi iterative method for solving nonlinear equations [63]. Thus, the outputs and the states can be computed by iterating
Note that the computation described in Eq. (5.78) can be interpreted as the representation of a network consisting of units that compute fw and gw . Such a network is called an encoding network, following an analogous terminology used for the recursive neural network model [64]. In order to build the encoding network, each node of the graph is replaced by a unit computing the function fw (see Figure 5.10). Each unit stores the current state xn(t) of node n, and when activated it calculates the state xn(t + 1) using the node label and the information stored in the neighborhood. The simultaneous and repeated activation of the units produce the behavior described in Eq. (5.78). The output of node n is produced by another unit, which implements gw. When fw and gw are implemented by FNNs, the encoding network turns out to be a recurrent neural network where the connections between the neurons can be divided into internal and external connections. The internal connectivity is determined by the neural network architecture used to implement the unit. The external connectivity depends on the edges of the processed graph.
The learning algorithm: Learning in GNNs consists of estimating the parameter w such that ϕw approximates the data in the learning dataset Eq. (5.73) where qi is the number of supervised nodes in Gi . For graph‐focused tasks, one special node is used for the target (qi = 1 holds), whereas for node‐focused tasks, in principle, the supervision can be performed on every node. The learning task can be posed as the minimization of a quadratic cost function
Figure 5.10 Graph (on the top, left), the corresponding encoding network (top right), and the network obtained by unfolding the encoding network (at the bottom).The nodes (the circles) of the graph are replaced, in the encoding network, by units computing fw and gw (the squares). When fw and gw are implemented by FNNs, the encoding network is a recurrent neural network. In the unfolding network, each layer corresponds to a time instant and contains a copy of all the units of the encoding network. Connections between layers depend on encoding network connectivity.
The learning algorithm is based on a gradient‐descent strategy and is composed of the following steps:
1 The states xn(t) are iteratively updated by Eq. (5.78) until at time T they approach the fixed‐point solution of Eq. (5.75): x(T) ≈ x.
2 The gradient ∂ew(T)/∂w is computed.
3 The weights w are updated according to the gradient computed in Step 2.
When it comes to Step 1, note that the hypothesis that Fw is a contraction map ensures convergence to the fixed point. Step 3 is carried out within the traditional framework of gradient descent. We will show in the following that Step 3 can be carried out in a very efficient way by exploiting the diffusion process that takes place in GNNs. This diffusion process is very much related to the process that occurs in recurrent neural networks, for which the gradient computation is based on the BPTT algorithm (see Chapter 3). In this case, the encoding network is unfolded from time T back to an initial time t0 . The unfolding produces the layered network shown in Figure 5.10. Each layer corresponds to a time instant and contains a copy of all the units fw of the encoding network. The units of two consecutive layers are connected following graph connectivity. The last layer corresponding to time T also includes the units gw and computes the output of the network.
Backpropagation through time consists of carrying out the traditional backpropagation step (see Chapter 3) on the unfolded network to compute the gradient of the cost function at time T with respect to all the instances of fw and gw . Then, ∂ew(T)/∂w is obtained by summing the gradients of all instances. However, backpropagation through time requires storing the states of every instance of the units. When the graphs and T − t0 are large, the memory required may be considerable. On the other hand, in this case, a more efficient approach is possible, based on the Almeida–Pineda algorithm [65, 66]. Since Eq. (5.78) has reached a stable point x before the gradient computation, we can assume that x(t) = x holds for