This is the so‐called support vector expansion; that is, w can be completely described as a linear combination of the training patterns xi . In a sense, the complexity of a function’s representation by SVs is independent of the dimensionality of the input space X, and depends only on the number of SVs. Moreover, note that the complete algorithm can be described in terms of dot products between the data. Even when evaluating f(x), we need not compute w explicitly. These observations will come in handy for the formulation of a nonlinear extension.
Computing b : Parameter b can be computed by exploiting the so‐called Karush−Kuhn−Tucker (KKT) conditions stating that at the point of the solution the product between dual variables and constraints has to vanish, giving αi(ε + ξi − yi + 〈w, xi〉 + b) = 0,
1 Only samples (xi, yi) with corresponding lie outside the ε ‐insensitive tube.
2 ; that is, there can never be a set of dual variables αi , that are both simultaneously nonzero. This allows us to conclude that(4.51) (4.52)
In conjunction with an analogous analysis on
(4.53)
Kernels: We are interested in making the SV algorithm nonlinear. This, for instance, could be achieved by simply preprocessing the training patterns xi by a map Φ: X → F into some feature space F, as already described in Chapter 2, and then applying the standard SV regression algorithm. Let us have a brief look at the example given in Figure 2.8 of Chapter 2. We had (quadratic features in ℝ2) with the map Φ: ℝ2 → ℝ3 with Φ
Implicit mapping via kernels: Clearly this approach is not feasible, and we have to find a computationally cheaper way. The key observation [96] is that for the feature map of the above example we have
(4.54)
As noted in the previous section, the SV algorithm only depends on the dot products between patterns xi . Hence, it suffices to know k(x, x′) ≔ 〈Φ(x), Φ(x′)〉 rather than Φ explicitly, which allows us to restate the SV optimization problem:
(4.55)
Now the expansion for f in Eq. (4.50) may be written as
The difference to the linear case is that w is no longer given explicitly. Also, note that in the nonlinear setting, the optimization problem corresponds to finding the flattest function in the feature space, not in the input space.
4.4.3 Combination of Fuzzy Models and SVR
Given observation data from an unknown system, data‐driven methods aim to construct a decision function f(x) that can serve as an approximation of the system. As seen from the previous sections, both fuzzy models and SVR are employed to describe the decision function. Fuzzy models characterize the system by a collection of interpretable if‐then rules, and a general fuzzy model that consists of a set of rules with the following structure will be used here:
(4.57)
Here, parameter d is the dimension of the antecedent variables x = [x1, x2, … , xd]T, Ri is the i‐th rule in the rule base, and Ai1 , … , Aipx are fuzzy sets defined for the respective antecedent variable. The rule consequent gi(x, βi) is a function of the inputs with parameters βi . Parameter c is the number of fuzzy rules. By modification of Eq.