Figure 3.1 correlates the convolutional neural network (CNN) with the HVS, in which we map the components of the CNN to the corresponding components of the HVS (Yamins et al 2014). This correspondence helps us understand the neural network. First, the input to a CNN is usually either 3D (RGB) or 1D (gray scale) pixel values after a preprocess of normalization. This roughly corresponds to part computations performed by the retina and lateral geniculate nucleus. The convolutional operations create feature maps that have a spatial layout, such as the retinotopic maps for visual information processing, and each of the artificial neurons can only process data within a receptive field of a limited extent. The convolutional filters define feature maps, which can be grouped into a number of different types.
Figure 3.1. Relating the CNN to the HVS consisting of the brain areas responsible for a sequence of visual information processing tasks.
In a good sense, artificial neural networks are engineered copies of the HVS. At the same time, an artificial neural network is not an exact replica of the human vision system or the human brain. In the biological neural networks such as the HVS or more generally the human brain, the learning process is much more complicated and is achieved by the combination of many factors, such as the surrounding environment, interest/attention/drive, mode, internal representations, and so on. While weights in an artificial network are initialized randomly, connections between biological neurons are genetically derived, and then reinforced or weakened in the learning process. Unlike the biological neural network, the artificial neural network is commonly trained by relevant data from the application domain. At present, the network topology is pre-specified based on a designer’s experience, does not change in the training process, and weights are randomly initialized and adjusted using an optimization algorithm to map input stimuli to desired output values. The single-layer neural network introduced in chapters 1 and 2 is a good example.
When we design an artificial neural network, we should select a network topology for feature extraction, a training procedure to update the network parameters, and so on. In order to output a desirable result, an artificial neural network should have certain properties. By construction, we must ensure that the artificial neural network is sufficiently expressive, and can be trained with relevant data toward a converged configuration. Convergence means that the process of training the neural network converges to a limit. When the neural network training process converges, the model has the tendency of stabilizing over time, predicting consistent, meaningful outputs from new data. As explained late in this chapter, several related problems must be addressed to have a convergent and favorable outcome.
3.1.2 Neuron models
Neurons are a special type of biological cell, serving as the organic computing units in a neuronal system. In previous studies, researchers built a computational model mirroring neurons in the biological system. In order to build such a model, it was essential to emulate the biological mechanism of the neuron.
According to biological research, a neuron consists of three key parts: the cell body, dendrites, and axons, as shown in figure 3.2. The cell body offers the energy supply for neuronal activities, where metabolism and other biochemical processes are carried out, dendrites are the ports to receive information from other neurons, and axons are the gateways that transmit the excitatory information to other neurons. In addition, the synapse is the structure in which one neuron interacts with another neuron for communication.
Figure 3.2. Key elements of a biological neuron.
According to neurobiological research results, the information processing and transmission mechanism of the biological neuron has the following characteristics:
1 Information integration: A neuron can integrate different neurotransmitters transmitted by other neurons into an overall response.
2 Potential difference: Defined as the difference between the electrical potentials inside and outside the cell membrane, and the differences in neuronal state.
3 Transmission threshold: The membrane potential changes constantly when a neuron receives information. If the fixed potential value is exceeded, an action potential is transmitted along an axon as an electrical pulse. Since the threshold is used, the neuron is a nonlinear system.
In reference to the characteristics of a biological neuron and its functions, an artificial neuron is an approximation to a biological neuron, which is a simple mathematical model illustrated in figure 3.3.
Figure 3.3. Mathematical model of an artificial neuron.
Mimicking a biological neuron, the mathematical model of an artificial neuron can be represented as follows: the input vector x represents the signals (from the dendrites), the weight vector w corresponds to the strength of the pathway (dendrite and synapse), and the summation node ∑ and the activation function φ(·) represent the integration and activation of the input signals (the cell body) and the thresholder output y (along the axon), respectively. Such an artificial neuron model can be formulated as follows:
v=∑i=1mxiwi+by=φ(v),(3.1)
where wi represents the weight for the input signal component xi, b is a bias, v is the inner product of the input vector and the weight vector, and y represents the output of the neuron after a nonlinear activation. Please note that the neuron based on the inner product is not the only type of neuron in artificial neural systems. For example, there are also quadratic neurons dedicated to extracting nonlinear features directly (Fan et al 2017a, 2017b). Discussion of these is outside the scope of this chapter.
Given the mathematical model of the neuron, the aforementioned single-hidden layer neural network can be formulated into equation (3.2). The structure of the single-hidden layer neural network is shown in figure 3.4. The corresponding mathematical formulas are as follows:
h(1)=φ(1)∑i=1mxiwi(1)+b(1)y=φ(2)∑j=1nhj(1)wj(2)+b(2),(3.2)
where x∈Rm is an input vector, h(1)∈Rn is an output vector of the hidden layer, w(1)∈Rm×n and b(1)∈Rn are the weight matrix and bias for the input to the hidden layer, and φ(1) and φ(2) are the corresponding activation functions, respectively. A multi-hidden-layer neural network can be obtained when the number of hidden layers exceeds 1, which is an extension of the single-hidden-layer neural network.
Figure 3.4. Architecture of the neural network with a single-hidden layer.
3.1.3 Activation function
The activation function is an essential part of an artificial neuron, which determines the output behavior of the neuron. The activation