6 C5 layer: The C5 layer is another convolutional layer. The filter size is 5 × 5. In total, 120 feature maps of 1 × 1 are produced by this layer.
7 F6 layer: The F6 layer is a fully connected layer consisting of 84 nodes. It represents a stylized image of the corresponding character class in a 7 × 12 bitmap.
8 Output layer: The output layer is also a fully connected layer, with ten nodes representing digits 0 to 9, respectively. The minimum output of a node indicates the positive identification result for that node, i.e. if the value of node i is the minimum among all the values of the output neurons, the recognition result for the digit of interest would be i. In any case, only a one-digit class will be assigned to the current image.
Specifically, all the aforementioned three features of a CNN can be found in the LeNet-5 network, which are the local connectivity, shared weight, and multiple feature maps. Since a convolution neural network is close to the real biological neural system in terms of information processing workflow, a CNN analyzes the structural information of digit images well.
3.2 Training, validation, and testing of an artificial neural network
In this section, we will describe a number of key concepts and critical skills related to practical applications of an artificial neural network, selecting a dataset to train the network, cross-validating the model, and testing the trained model. We will also present closely related topics, including overfitting, bias, dropout, pruning, data augmentation, and so on. To better understand these basic concepts, let us take a look at the overall construction, training, and testing process of a convolutional neural network. The whole process is shown in figure 3.17, and can be divided into the following stages:
1 Designing: Given a specific task, we can design or select a convolutional neural network architecture based on our experience, including the topology and details on convolution, activation, pooling, loss function, and so on. Then, the weights of the neural network will be initialized.
2 Forward propagation: Training samples are fed into the network to produce the corresponding outputs. The input samples are those samples that are in the training dataset used to train the network, and the rest of the samples in the training dataset are used for validation, i.e. to verify the convergence of the network.
3 Backpropagation: The weights of the neural network are updated using the above-described backpropagation method.
4 Iteration: Repeat steps 2 and 3 until the network converges.
5 Measure the performance of the trained network on the testing dataset.
Figure 3.17. Workflow of designing, training, and testing an artificial neural network.
3.2.1 Training, validation, and testing datasets
In the context of neural network (also referred to as a model) based machine learning, the training data (in a general sense) used to build the final model are usually divided into three types for three inter-related purposes: training (in a specific sense), validation, and testing. The training dataset is used to estimate the model, the validation dataset is used to determine the network structure or the parameters that control the complexity of the model, and the test dataset tests the performance of the final selected optimal model. Brian D Ripley gave the definition of these three words in his classic monograph Pattern Recognition and Neural Networks (Bell and Sejnowski 1997).
1 Training set: A set of samples used for minimizing the loss function. The training set is also used to compute the gradient of the loss function and then adjust the parameters (i.e. weights) of the network.
2 Validation set: A set of samples used to evaluate the trained network and avoid overfitting. In the case of overfitting, the network will perform poorly on the validation set.If the network performs poorly on either training or validation data, we can train and validate further or modify the network architecture. Currently, there is no governing theory on training, validation, and network architectural design. Therefore, practical experience is quite important in the field of machine learning. Also, training and validation data can switch their roles as needed so that the network can be trained up to an optimal performance. Multiple training–validation cycles can be used in a network training process.
3 Testing set: A set of samples never used for training and validation. These testing samples can be processed by the network that has been well trained and validated through steps 1 and 2. The performance of the final network is characterized with the testing set.
In summary, the training set is used to train the network or determine the parameters of the model; the validation set is used for model validation, modification, or selection; and the testing set is purely used to characterize the final model.
3.2.2 Training, validation, and testing processes
The aforementioned three types of datasets are used at different stages of the network development for a specific application. This model was initially applied to the training set to fit model parameters. A model is trained on training data using a supervised learning method such as gradient descent or random gradient descent search. In practice, a training set typically consists of input vectors (or scalars) and their corresponding output vectors (or scalars), which are also called targets, labels, or markers. Normally, we run the current model on the training set, generate a result for each input in the set, and then compare it to the target. The parameters of the model are adjusted based on the comparison using a particular learning algorithm. Model fitting can be done from scratch or via transfer learning (to be detailed later).
The fitted model is used to process a second dataset called the validation dataset and produce the outputs accordingly. In general, the validation set is coupled with the training set to arrive at an accurate and robust neural network. In other words, in the training process the training set is used to update the parameters of the model while the validation set is used to sense the convergence of the model. The error rate of the final model on the validation set is usually smaller than the true error rate since the validation set is used to confirm, modify, or select the final model. One needs to stop training the network when the validation error increases, as this is a sign of overfitting to the training set. This simple process, called early stopping, is complicated in practice because errors in the validation data can fluctuate during the training process, depending on details of the training protocol in use, typically yielding multiple local minima. This complexity has led to the development of several special rules to detect signs of overfitting.
Конец ознакомительного фрагмента.
Текст предоставлен ООО «ЛитРес».
Прочитайте эту книгу целиком, купив полную легальную версию на ЛитРес.
Безопасно оплатить книгу можно банковской картой Visa, MasterCard, Maestro, со счета мобильного телефона, с платежного терминала, в салоне МТС или Связной, через PayPal, WebMoney, Яндекс.Деньги, QIWI Кошелек, бонусными картами или другим удобным Вам способом.