where K is the index of the last graph convolutional layer.
5.2.3 Graph Autoencoders (GAEs)
These are deep neural architectures that map nodes into a latent feature space and decode graph information from latent representations. GAEs can be used to learn network embeddings or generate new graphs.
Network embedding (encoding) is a low‐dimensional vector representation of a node that preserves a node’s topological information. GAEs learn network embeddings using an encoder to extract network embeddings and using a decoder to enforce network embeddings to preserve the graph topological information such as the PPMI matrix and the adjacency matrix (see Figure 5.6).
Earlier approaches mainly employ multi‐layer perceptrons to build GAEs for network embedding learning. Deep Neural Network for Graph Representations (DNGR) uses a stacked denoising autoencoder to encode and decode the PPMI matrix via multi‐layer perceptrons. Concurrently, Structural Deep Network Embedding (SDNE) uses a stacked autoencoder to jointly preserve the node first‐order proximity and second‐order proximity. SDNE proposes two loss functions on the outputs of the encoder and the outputs of the decoder separately. The first loss function enables the learned network embeddings to preserve the node’s first‐order proximity by minimizing the distance between a node’s network embedding and its neighbors’ network embeddings. The first loss function L1st is defined as
Figure 5.6 A graph autoencoder (GAE) for network embedding. The encoder uses graph convolutional layers to get a network embedding for each node. The decoder computes the pairwise distance given network embeddings. After applying a nonlinear activation function, the decoder reconstructs the graph adjacency matrix. The network is trained by minimizing the discrepancy between the real adjacency matrix and the reconstructed adjacency matrix.
Source: Wu et al. [38].
(5.60)
where xv = Av, : and enc(·) is an encoder that consists of a multi‐layer perceptron. The second loss function enables the learned network embeddings to preserve the node’s second‐order proximity by minimizing the distance between a node’s inputs and its reconstructed inputs and is defined as
(5.61)
where bv, u = 1 if Av, u = 0, bv, u = β > 1 if Av, u = 1, and dec(·) is a decoder that consists of a multi‐layer perceptron.
DNGR [54] and SDNE [55] only consider node structural information about the connectivity between pairs of nodes. They ignore the fact that the nodes may contain feature information that depicts the attributes of nodes themselves. Graph Autoencoder (GAE*) [56] leverages GCN [14] to encode node structural information and node feature information at the same time. The encoder of GAE* consists of two graph convolutional layers, which takes the form
where Z denotes the network embedding matrix of a graph, f(·) is a ReLU activation function, and the Gconv (·) function is a graph convolutional layer defined by Eq. (5.44). The decoder of GAE* aims to decode node relational information from their embeddings by reconstructing the graph adjacency matrix, which is defined as
(5.63)
where zv is the embedding of node v. GAE* is trained by minimizing the negative cross‐entropy given the real adjacency matrix A and the reconstructed adjacency matrix
Simply reconstructing the graph adjacency matrix may lead to overfitting due to the capacity of the autoencoders. The variational graph autoencoder (VGAE) [56] is a variational version of GAE that was developed to learn the distribution of data. The VGAE optimizes the variational lower bound L:
where KL(·) is the Kullback–Leibler divergence function, which measures the distance between two distributions; p(Z) is a Gaussian prior
The mean vector μi is the i‐th row of an encoder’s outputs defined by Eq. (5.62), and log σi is derived similarly as μi with another encoder. According to Eq. (5.64), VGAE assumes that the empirical distribution q(Z ∣ X, A) should be as close as possible to the prior distribution p(Z). To further enforce this, the empirical distribution q(Z ∣ X, A) is chosen to approximate the prior distribution p(Z).
Like GAE*, GraphSAGE [23] encodes node features with two graph convolutional layers. Instead of optimizing the reconstruction error, GraphSAGE shows that the relational information between two nodes can be preserved by negative sampling with the loss:
(5.65)
where node u is a neighbor of node v, node vn is a distant node to node v and is sampled from a negative sampling distribution Pn(v), and Q is the number of negative samples. This loss function essentially imposes similar representations on close nodes and dissimilar representations on distant nodes.
Deep Recursive Network Embedding (DRNE) [57] assumes that a node’s network embedding should approximate the aggregation of its neighborhood network embeddings. It adopts an LSTM network [26] to aggregate a node’s neighbors. The reconstruction error of DRNE is defined as
where zv is the network embedding of node v obtained by a dictionary look‐up,