Figure 1.1: Left: image in Euclidean space. Right: graph in non-Euclidean space.
1.1.2 NETWORK EMBEDDING
The other motivation comes from graph embedding [Cai et al., 2018, Cui et al., 2018, Goyal and Ferrara, 2018, Hamilton et al., 2017a, Zhang et al., 2018a], which learns to represent graph nodes, edges, or subgraphs in low-dimensional vectors. In graph analysis, traditional machine learning approaches usually rely on hand-engineered features and are limited by its inflexibility and high cost. Following the idea of representation learning and the success of word embedding [Mikolov et al., 2013], DeepWalk [Perozzi et al., 2014], which is regarded as the first graph embedding method based on representation learning, applies SkipGram model [Mikolov et al., 2013] on the generated random walks. Similar approaches such as node2vec [Grover and Leskovec, 2016], LINE [Tang et al., 2015], and TADW [Yang et al., 2015b] also achieved breakthroughs. However, these methods suffer from two severe drawbacks [Hamilton et al., 2017a]. First, no parameters are shared between nodes in the encoder, which leads to computational inefficiency, since it means the number of parameters grows linearly with the number of nodes. Second, the direct embedding methods lack the ability of generalization, which means they cannot deal with dynamic graphs or be generalized to new graphs.
1.2 RELATED WORK
There exist several comprehensive reviews on GNNs. Monti et al. [2017] propose a unified framework, MoNet, to generalize CNN architectures to non-Euclidean domains (graphs and manifolds) and the framework could generalize several spectral methods on graphs [Atwood and Towsley, 2016, Kipf and Welling, 2017] as well as some models on manifolds [Boscaini et al., 2016, Masci et al., 2015]. Bronstein et al. [2017] provide a thorough review of geometric deep learning, which presents its problems, difficulties, solutions, applications, and future directions. Monti et al. [2017] and Bronstein et al. [2017] focus on generalizing convolutions to graphs or manifolds, while in this book we only focus on problems defined on graphs and we also investigate other mechanisms used in GNNs such as gate mechanism, attention mechanism, and skip connections. Gilmer et al. [2017] propose the message passing neural network (MPNN) which could generalize several GNN and graph convolutional network approaches. Wang et al. [2018b] propose the non-local neural network (NLNN) which unifies several “self-attention”-style methods. However, in the original paper, the model is not explicitly defined on graphs. Focusing on specific application domains, Gilmer et al. [2017] and Wang et al. [2018b] only give examples of how to generalize other models using their framework and they do not provide a review over other GNN models. Lee et al. [2018b] provides a review over graph attention models. Battaglia et al. [2018] propose the graph network (GN) framework which has a strong capability to generalize other models. However, the graph network model is highly abstract and Battaglia et al. [2018] only give a rough classification of the applications.
Zhang et al. [2018f] and Wu et al. [2019c] are two comprehensive survey papers on GNNs and they mainly focus on models of GNN. Wu et al. [2019c] categorize GNNs into four groups: recurrent graph neural networks (RecGNNs), convolutional graph neural networks (ConvGNNs), graph auto-encoders (GAEs), and spatial-temporal graph neural networks (STGNNs). Our book has a different taxonomy with Wu et al. [2019c]. We present graph recurrent networks in Chapter 6. Graph convolutional networks are introduced in Chapter 5 and a special variant of ConvGNNs, graph attention networks, are introduced in Chapter 7. We present the graph spatial-temporal networks in Section 9.4 as the models are usually used on dynamic graphs. We introduce graph auto-encoders in Section 10.4 as they are trained in an unsupervised fashion.
In this book, we provide a thorough introduction to different GNN models as well as a systematic taxonomy of the applications. To summarize, the major contents of this book are as follows.
• We provide a detailed review over existing GNN models. We introduce the original model, its variants and several general frameworks. We examine various models in this area and provide a unified representation to present different propagation steps in different models. One can easily make a distinction between different models using our representation by recognizing corresponding aggregators and updaters.
• We systematically categorize the applications and divide them into structural scenarios, non-structural scenarios, and other scenarios. For each scenario, we present several major applications and their corresponding methods.
CHAPTER 2
Basics of Math and Graph
2.1 LINEAR ALGEBRA
The language and concepts of linear algebra have come into widespread usage in many fields in computer science, and machine learning is no exception. A good comprehension of machine learning is based upon a thoroughly understanding of linear algebra. In this section, we will briefly review some important concepts and calculations in linear algebra, which are necessary for understanding the rest of the book. In this section, we review some basic concepts and calculations in linear algebra which are necessary for understanding the rest of the book.
2.1.1 BASIC CONCEPTS
• Scalar: A number.
• Vector: A column of ordered numbers, which can be expressed as the following:
The norm of a vector measures its length. The Lp norm is defined as follows:
The L1 norm, L2 norm and L∞ norm are often used in machine learning.
The L1 norm can be simplified as
In Euclidean space ℝn, the L2 norm is used to measure the length of vectors, where
The L∞ norm is also called the max norm, as
With Lp norm, the distance of two vectors x1, x2 (where x1 and x2 are in the same linear space) can be defined as
A set of vectors x1, x2, …, xm are linearly independent if and only if there does not exist a set of scalars λ1, λ2 …, λm, which are not all 0, such that
• Matrix: A two-dimensional array, which can be expressed as the following:
where A ∈ ℝm × n.
Given two matrices A ∈ ℝm × n