Artificial Intelligence and Quantum Computing for Advanced Wireless Networks. Savo G. Glisic. Читать онлайн. Newlib. NEWLIB.NET

Информация о произведении:

Автор:	Savo G. Glisic
Издательство:	John Wiley & Sons Limited
Серия:
Жанр произведения:	Программы
Год издания:	0
isbn:	9781119790310

Скачать книгу

to the odds ratio. The converse relationship is

(2.4) upper O upper R Subscript 1 comma 2 Baseline equals e Superscript l 1 minus l 2

In logistic regression, a categorical dependent variable Y having G (usually G = 2) unique values is regressed on a set of p independent variables X₁, X₂, …, X_p.

Let X = (X₁, X₂, …, X_p) and B_g = (β_g1, …, β_gp)^T; then the logistic regression model is given by the G equations ln(p_g /p₁) = ln(P_g /P₁) + β_g1 X₁ + β_g2 X₂ + …. + β_gp X_p = ln(P_g/P₁) + XB_g. Here, p_g is the probability that an individual with values X₁, X₂, …, X_p is in outcome g. That is, p_g = Pr(Y = g ∣ X). Usually, X₁ ≡ 1 (that is, an intercept is included), but this is not necessary. The quantities P₁, P₂, …, P_G represent the prior probabilities of outcome membership. If these prior probabilities are assumed equal, then the term ln (P_g/P₁) becomes zero and drops out. If the priors are not assumed equal, they change the values of the intercepts in the logistic regression equation.

The first outcome is called the reference value. The regression coefficients β₁ , β₂ , …, β_p for the reference value are set to zero. The choice of the reference value is arbitrary. Usually, it is the most frequent value or a control outcome to which the other outcomes are to be compared. This leaves G − 1 logistic regression equations in the logistic model.

Schematic illustration of the regression line for predicting Y star from X star is not the 45 degrees line.

Figure 2.2 The regression line for predicting Y* from X* is not the 45° line. It has slope rXY, which is less than 1. Hence it “regresses” toward the X‐axis. For this data sample, rXY = 0.69.

The β’s are population regression coefficients that are to be estimated from the data. Their estimates are represented by b’s. The β’s represent unknown parameters to be estimated, whereas the b’s are their estimates. These equations are linear in the logits of p. However, in terms of the probabilities, they are nonlinear. The corresponding nonlinear equations are

(2.5)

since e Superscript normal upper X normal upper B 1 Baseline equals 1 because all of its regression coefficients are zero. Using the fact that e^{a + b} = (e^a)(e^b), e^XB may be reexpressed as follows: e^XB = exp(β₁ X₁ + β₂ X₂ + ⋯ + β_ρ X_p) = e^β1 ^X1 e β2 X2 …e βp Xp . This shows that the final value is the product of its individual terms.

2.1.3 Decision Tree: Regression Trees Versus Classification Trees

The decision tree (Figure 2.3) is a type of supervised learning algorithm (having a predefined target variable) that is mostly used in classification problems. It works for both categorical and continuous input and output variables. In this technique, we split the population or sample into two or more homogeneous sets (or sub‐populations) based on the most significant splitter/differentiator in the input variables.

Types of decision tree are based on the type of target variable used. If a categorical target variable (zero/one or yes/no) as described in the previous section is used, then we have a categorical variable decision tree. If a continuous target variable is used, then we have a continuous variable decision tree. The basic tree terminology is presented in Figure 2.4.

Schematic illustration of decision tree.

Figure 2.3 Decision tree.

Schematic illustration of tree terminology.

Figure 2.4 Tree terminology.

Regression trees versus classification trees: From Figure 2.4 we can see that the terminal nodes (or leaves) lie at the bottom of the decision tree. This means that decision trees are typically drawn upside down such that leaves are the bottom and the roots are the top.

Both the trees work almost similar to each other. Let us look at the primary differences and similarities between classification and regression trees:

Regression trees are used when the dependent variable is continuous. Classification trees are used when dependent variable is categorical.

In the case of regression trees, the value obtained by terminal nodes in the training data is the mean response of observations falling in that region. Thus, if an unseen data observation falls in that region, we will make its prediction with the mean value.

In the case of classification trees, the value (class) obtained by terminal nodes in the training data is the mode of observations falling in that region. Thus, if an unseen data observation falls in that region, we will make its prediction with the mode value.

Both the trees divide the predictor space (independent variables) into distinct and non‐overlapping regions. For the sake of simplicity, we can think of these regions as high‐dimensional boxes.

Both the trees follow a top‐down greedy approach known as recursive binary splitting. We call it “top‐down” because it begins from the top of the tree when all the observations are available in a single region and successively splits the predictor space into two new branches down the tree. It is known as “greedy” because the algorithm cares about (looks for the best variable available) only the current split, and not about future splits which will lead to a better tree.

This splitting process is continued until a user‐defined stopping criterion is reached. For example, we can tell the algorithm to stop once the number of observations per node becomes less than 50.

In both the cases, the splitting process results in fully grown trees until the stopping criterion is reached. However, the fully grown tree is likely to overfit data, leading to poor accuracy on unseen data. This is handled by “pruning,” which is one of the techniques used to tackle overfitting.

Скачать книгу