3.1 NEURON
The basic units of neural networks are neurons, which can receive a series of inputs and return the corresponding output. A classic neuron is as shown in Figure 3.1. Where the neuron receives n inputs x1, x2, …, xn with corresponding weights w1, w2, …, wn and an offset b. Then the weighted summation
• Sigmoid Function (Figure 3.2):
• Tanh Function (Figure 3.3):
• ReLU (Rectified Linear Unit) (Figure 3.4):
Figure 3.1: A classic neuron structure.
Figure 3.2: The Sigmoid function.
In fact, there are many other activation functions and each has its corresponding derivatives. But do remember that a good activation function is always smooth (which means that it is a continuous differentiable function) and easily calculated (in order to minimize the computational complexity of the neural network). During the training of a neural network, the choice of activation function is usually essential to the outcome.
3.2 BACK PROPAGATION
During the training of a neural network, the back propagation algorithm is most commonly used. It is an algorithm based on gradient descend to optimize the parameters in a model. Let’s take the single neuron model illustrated above for an example. Suppose the optimization target for the output z is z0, which will be approached by adjusting the parameters w1, w2, …, wn, b.
Figure 3.3: The Tanh function.
Figure 3.4: The ReLU (Rectified Linear Unit) function.
By the chain rule, we can deduce the derivative of z with respect to wi and b:
With a learning rate of η, the update for each parameter will be:
Figure 3.5: Feedforward neural network.
In summary, the process of the back propagation consists of the following two steps.
• Forward calculation: given a set of parameters and an input, the neural network computes the values at each neuron in a forward order.
• Backward propagation: compute the error at each variable to be optimized, and update the parameters with their corresponding partial derivatives in a backward order.
The above two steps will go on repeatedly until the optimization target is acquired.
3.3 NEURAL NETWORKS
Recently, there is a booming development in the field of machine learning (especially deep learning), represented by the appearance of a variety of neural network structures. Though varying widely, the current neural network structures can be classified into several categories: feedforward neural networks, convolutional neural networks, recurrent neural networks, and GNNs.
• Feedforward neural network: The feedforward neural network (FNN) (Figure 3.5) is the first and simplest network architecture of artificial neural network. The FNN usually contains an input layer, several hidden layers, and an output layer. The feedforward neural network has a clear hierarchical structure, which always consists of multiple layers of neurons, and each layer is only connected to its neighbor layers. There are no loops in this network.
• Convolutional neural network: Convolutional neural networks (CNNs) are special versions of FNNs. FNNs are usually fully connected networks while CNNs preserve the local connectivity. The CNN architecture usually contains convolutional layers, pooling layers, and several fully connected layers. There exist several classical CNN architectures such as LeNet5 [LeCun et al., 1998], AlexNet [Krizhevsky et al., 2012] (Figure 3.6), VGG [Simonyan and Zisserman, 2014], and GoogLeNet [Szegedy et al., 2015]. CNNs are widely used in the area of computer vision and proven to be effective in many other research fields.
• Recurrent neural network: In comparison with FNN, the neurons in recurrent neural network (RNN) receive not only signals and inputs from other neurons, but also its own historical information. The memory mechanism in recurrent neural network (RNN) help the model to process series data effectively. However, the RNN usually suffers from the problem of long-term dependencies [Bengio et al., 1994, Hochreiter et al., 2001]. Several variants are proposed to solve the problem by incorporating the gate mechanism such as GRU [Cho et al., 2014] and LSTM [Hochreiter and Schmidhuber, 1997]. The RNN is widely used in the area of speech and natural language processing.
• Graph neural network: The GNN is designed specifically to handle graph-structured data, such as social networks, molecular structures, knowledge graphs, etc. Detailed descriptions of GNNs will be covered in the later chapters of this book.
Figure