1 Supervised learning
2 Unsupervised learning
3 Semi‐supervised learning
4 Reinforcement learning
Let's learn them one by one:
1 Supervised learning. It's a learning algorithm in which the machine is trained with data that is well labeled and predicts with the help of a labeled dataset.FIGURE 1.5 Machine learning algorithms.What is labeled data? The data for which you already know the target answer is called labeled data. For example, if I show you an image and tell you that it is a butterfly, then it's called labeled data. However, if I show you an image without telling you what it is, that is referred to as unlabeled data.Now let's understand with an example how labeled data makes a machine learn.We have images that are labeled as spoon and knife; we then feed them to the machine, which analyzes and learns the association of these images with their labels based on their features such as shape, size, and sharpness. Now, if any new image is fed to the machine without any label, the past data helps the machine to predict accurately and tell whether it's a spoon or knife. Thus, in supervised machine learning, the algorithm teaches the model to learn from the labeled example that we provide.It consists of two techniques: classification and regression.Classification. For example, if the output variable is categorical such as red or blue, disease or no disease, male or female, will I get an increment or not?Regression. Regression is a problem when the output variable is a real or a continuous value, for example, salary based on work experience or weight based on height. So, it creates predictive models showing trends in data. For example, how much increment will I get?The following is a list of commonly used algorithms in supervised learning:Nearest neighborNaive BayesDecision treesLinear regressionSupport vector machines (SVM)Neural networksLogistic regressionLinear discriminant analysisSimilarity learning
2 Unsupervised learning. In this learning, no training is given to the machine, allowing it to act on data that is not labeled. Hence, the machine tries to identify the patterns and provide the predictions. Let's take the example of a spoon and knife, but this time we do not tell the machine whether it's a spoon or a knife. The machine by itself identifies patterns from the set and makes a group based on their patterns, similarities, differences, and so on.Unsupervised learning consists of two techniques: clustering and association.Clustering. In clustering, the machine forms groups based on the behavior of the data. For example, which customer made similar product purchases?Association. It is an area of machine learning that identifies exceptional relationships between variables in large datasets. For example, which products were purchased together?The following is a list of commonly used algorithms in unsupervised learning:k‐means clusteringAssociation rules
3 Semi‐supervised learning. Semi‐supervised learning is a type of machine learning that uses a combination of both supervised and unsupervised learning techniques. It is used in a scenario where our dataset is a combination of both labeled and unlabeled data.For example, let's assume that we have access to a large number of unlabeled datasets that we like to train a model on. Manually labeling the whole data by ourselves is just not practical. So, instead of labeling the whole dataset, we manually label some parts of the dataset ourselves and use that portion to train our model. But this way, all the unlabeled data will be of no use. As we know, the more data we have to train our model, the better and more robust our model would be. So what can we do to use the unlabeled data of our dataset?This is why semi‐supervised learning was introduced. To prevent our unlabeled data from getting wasted, we can implement a technique of semi‐supervised learning called pseudo labeling.To understand pseudo labeling, let's continue with the example mentioned previously.Our model is trained using labeled data, and it is performing pretty well. Everything to this point is just regular supervised learning. Now we will use unsupervised learning to predict the remaining unlabeled portion of data. We will serve the unlabeled data to our model. The trained model will then process this data, and as a result, it will predict individual outputs for each piece of unlabeled data. Thus, pseudo labeling is a process of labeling the unlabeled data with the output that is predicted by our neural network. With pseudo labeling, we can train on an audaciously larger dataset.
4 Reinforcement learning. There is no predefined data in reinforcement learning. It is the area of machine learning that is concerned with behavioral psychology. In this learning, an agent is put into an environment, and he learns to behave in this environment by performing certain actions and observing the awards that they get from their actions. Reinforcement learning involves software agents that take appropriate actions in a particular situation to earn maximum rewards. There is no expected output in this learning. The reinforcement agent decides what actions to take to perform a task. In the absence of the training dataset, it is bound to learn from its own experience.The following is a list of commonly used algorithms in reinforcement learning:Q‐learningTemporal difference (TD)Deep adversarial networksNow to choose which algorithm is right for your problem, you should categorize your problem according to the following:Categorize by inputLabeled data: supervised learningUnlabeled data: unsupervised learningCombination of labeled and unlabeled data: semi‐supervised learningNo data and want to optimize an objective function by interacting with an environment: reinforcement learningCategorize by outputIf the output of a model is a number: regression problemIf the output of a model is a class: classification problemIf the output of your model is a set of input groups: clustering problemTo detect an anomaly: anomaly detectionUnderstand your constraintsStorage capacity of modelFast predictionFast learningFind the available algorithms: Factors affecting the choice of the model are:Business goalsAmount of preprocessing required in dataAccuracy of the modelScalability of the modelConsider model complexityComplex feature engineeringComputational overheadThese points can help you to choose the right algorithm for developing a solution to a real‐time business problem that requires knowledge of business requirements, rules and regulations, and stakeholders' interests as well as significant expertise. Hence, to solve a machine problem, it is crucial to combine and balance algorithms for valuable results.
Validation
Once the machine learning model has been properly trained on a given dataset, then we have to test the model. In this step, we check for the accuracy of the model by rendering a test dataset to it. Testing the model is important to find out the percentage accuracy of the model as per the project requirement or given problem.
The input of this validation stage is the trained model produced by the earlier step in the model learning stage, and the output is a validated model that provides enough information to allow users to check whether the machine learning model is appropriate for its intended purpose. Thus, this validation stage of the machine learning lifecycle deals with whether the model is working properly as desired or not when fed with unseen inputs. Thus, model validation is the process that evaluates a trained model on a test dataset. This step renders the generalization ability of the trained model.
Deployment
The last step of the machine learning lifecycle is deployment, where we deploy the ML model in the real‐world system. Deployment is a very crucial step in the machine learning lifecycle process. Deployment is a process of making your model available to make predictions in the production environment. The aim of this stage is to check the proper functionality of the model post‐deployment. The models need to be deployed in such a way that they can be used for inference as well as be updated regularly. If the prepared model produces an accurate result as per the specified requirements, with acceptable speed, only then do we deploy the model in the real system. But before deploying the project, you need to check whether it is improving its performance using available data or not and whether you want to go with a Platform as a Service (PaaS) or Infrastructure as a Service (IaaS). A PaaS is excellent for prototyping and businesses with lower traffic. Eventually, when the business grows and traffic increases, you need to switch to IaaS. This is the step to test the ability to predict outcomes in the real world.
Advantages of Machine Learning
Machine