1.3.1 Machine Learning
A subset of AI focuses on algorithm development by learning from experience and helps in the improvement of decision making with greater accuracy. The categories and the corresponding tasks are shown in Figure 1.3. Supervised, unsupervised, and reinforcement are the three main learning paradigms. Supervised is the most prevalent training paradigm for developing ML models for both classification and regression tasks [27]. It finds the relationship between the input and target variables. Some of the supervised learning algorithms are support vector machine (SVM), logistic regression, Decision Tree (DT), random forest, and so on. Unsupervised learning is often used for clustering and segmentation tasks. This method does not require any target variable to group the input data sets. Some of the examples are K-means, hierarchical, density, grid clustering, and so on. Reinforcement learning corresponds to responding to the environment and deciding the right action to complete the assignment with maximum reward in a given application. It finds its applications in a real-time environment.
Figure 1.3 Types of machine learning.
Figure 1.4 Generic methodology in building a model using machine learning algorithms.
In ML, training is performed with a huge amount of data to get accurate decisions or predictions. The general steps involved in building an ML model are shown in Figure 1.4.
1.3.1.1 Data Pre-processing
It is a process of converting raw data into a usable and efficient format.
1.3.1.2 Feature Extraction
Before training a model, most applications need first transforming the data into a new representation. Applying pre-processing modifications to input data before presenting it to a network is almost always helpful, and the choice of pre-processing will be one of the most important variables in determining the final system’s performance. The reduction of the dimensionality of the input data is another key method in which network performance can be enhanced, sometimes dramatically. To produce inputs for the network, dimensionality reductions entail creating linear or nonlinear combinations of the original variables. Feature extraction is the process of creating such input combinations, which are frequently referred to as features. The main motivation for dimensionality reduction is to help mitigate the worst impacts of high dimensionality.
1.3.1.3 Working With Data Sets
The most popular method is to split the original data into two or more data sets at random or using statistical approaches. A portion of the data is used to train the model, whereas a second subset is used to assess the model’s accuracy. It is vital to remember that while in training mode, the model never sees the test data. That is, it never uses the test data to learn or alter its weights. The training data is a set of data that represent the data that the ML will consume to answer the problem it was created to tackle. In certain circumstances, the training data have been labeled—that is, it has been “tagged” with features and classification labels that the model will need to recognize. The model will have to extract such features and group them based on their similarity if the data is unlabeled. To improve the generalization capability of the model, the data set can be divided into three sets according to their standard deviation: training sets, validation sets, and testing sets. The validation set is used to verify the network’s performance during the training phase, which in turn is useful to determine the best network setup and related parameters. Furthermore, a validation error is useful to avoid overfitting by determining the ideal point to stop the learning process.
1.3.1.4 Model Development
The ultimate goal of this stage is to create, train, and test the ML model. The learning process is continued until it provides an appropriate degree of accuracy on the training data. A set of statistical processing processes is referred to as an algorithm. The type of algorithm used is determined by the kind (labeled or unlabeled) and quantity of data in the training data set, as well as the problem to be solved. Different ML algorithms are used concerning labeled data. The ML algorithm adjusts weights and biases to give accurate results.
i. Support Vector Machine
Support vector machine finds out an optimum decision boundary to divide the linear data into different classes. It is also useful to classify nonlinear data by employing the concept of kernels to transform the input data into higher dimension data. The nonlinear data will be categorized into different classes in the new higher-dimensional space by finding out an optimum decision surface.
ii. Regression Algorithm
Regression methods, such as linear and logistic regression, are used to understand data relationships. Independent variables are used to predict the value of a dependent variable using linear regression. When the dependent variable is binary, such as x or y, logistic regression can be employed. The dependency of crop yield overirrigation and fertilization is an example of linear regression. Using temperature, nitrogen, phosphorous, and potassium content in the soil, rainfall, pH of the soil as independent variables; yield can be forecasted using multiple regression.
iii. Decision Tree
The most powerful and widely used tool for classification and prediction is the DT algorithm. A DT is a tree structure that resembles a flowchart, with each leaf node representing the outcome, an inside node indicating a feature (or attribute), and a branch representing a decision rule. In a DT, the root node is the uppermost node. A Top-Down technique is used to classify the instances by sorting them down the tree from the root to a leaf node, with the leaf node provides the classification label to the given data set. This process is called recursive partitioning. Figure 1.5 shows an example of the application of the DT algorithm for the identification of leaf disease in cotton crops.
iv. K-means Clustering
It uses categorization to determine the likelihood of a data point belonging to one of two groups based on its proximity to other data points. The first stage in the k-means clustering algorithm is to determine the number of clusters (K) that will be obtained as a final result. The cluster’s centroids are then chosen at random from a set of k items or objects. Based on a distance metric, all remaining items (objects) are