Artificial Intelligence and Quantum Computing for Advanced Wireless Networks. Savo G. Glisic. Читать онлайн. Newlib. NEWLIB.NET

Информация о произведении:

Автор:	Savo G. Glisic
Издательство:	John Wiley & Sons Limited
Серия:
Жанр произведения:	Программы
Год издания:	0
isbn:	9781119790310

Скачать книгу

href="#fb3_img_img_7014d630-6fbf-5b3b-a443-90f829143c39.png" alt="ModifyingAbove normal x With ampersand c period circ semicolon equals upper R left-parenthesis bold-italic chi right-parenthesis"/>, such that

ModifyingAbove normal x With ampersand c period circ semicolon almost-equals normal x

An interesting property of any dimensionality reduction technique is to consider its stability. In this context, a technique is said to be ε‐stable if for any two input data points, x₁ and x₂, the following inequality holds [36]: . Intuitively, this equation implies that Euclidean distances in the original input space are relatively conserved in the output feature space.

Methods based on statistics and information theory: This family of methods reduces the input data according to some statistical or information theory criterion. Somehow, the methods based on information theory can be seen as a generalization of the ones based on statistics in the sense that they can capture nonlinear relationships between variables, can handle interval and categorical variables at the same time, and many of them are invariant to monotonic transformations of the input variables.

Vector quantization and mixture models: Probably the simplest way of reducing dimensionality is by assigning a class {among a total of K classes) to each one of the observations x_n. This can be seen as an extreme case of dimensionality reduction in which we go from M dimensions to 1 (the discrete class label χ). Each class, χ, has a representative normal x overbar Subscript chi which is the average of all the observations assigned to that class. If a vector x_n has been assigned to the χ_n‐th class, then its approximation after the dimensionality reduction is simply ModifyingAbove normal x With ampersand c period circ semicolon Subscript n Baseline equals normal x overbar Subscript chi Sub Subscript normal n , (see Figure 2.19).

The goal is thus to find the representatives normal x overbar Subscript chi , and class assignment u_χ(x)(u_χ(x) is equal to 1 if the observation x is assigned to the χ‐th class, and is 0 otherwise) such that upper J Subscript italic upper V upper Q Baseline equals upper E left-brace sigma-summation Underscript chi equals 1 Overscript upper K Endscripts u Subscript chi Baseline left-parenthesis normal x right-parenthesis double-vertical-bar normal x minus normal x overbar Subscript chi Baseline double-vertical-bar squared right-brace is minimized. This problem is known as vector quantization or k‐means, already briefly introduced in Section 2.1. The optimization of this goal function is a combinatorial problem, although there are heuristics to cut down its cost [37, 38]. An alternative formulation of the k‐means objective function is upper J Subscript italic upper V upper Q Baseline equals double-vertical-bar upper X minus italic upper W upper U double-vertical-bar Subscript upper F Superscript 2 subject to U^t U = I and u_ij ∈ {0, 1} {i.e. that each input vector is assigned to one and only one class). In this expression, W is a M × m matrix with all representatives as column vectors, U is an m × N matrix whose ij‐th entry is 1 if the j‐th input vector is assigned to the i‐th class, and double-vertical-bar dot double-vertical-bar Subscript upper F Superscript 2 denotes the Frobenius norm of a matrix. This intuitive goal function can be put in a probabilistic framework. Let us assume we have a generative model of how the data is produced. Let us assume that the observed data are noisy versions of K vectors x_χ which are equally likely a priori. Let us assume that the observation noise is normally distributed with a spherical covariance matrix = σ² I. The likelihood of observing x_n having produced x_χ is

Schematic illustration of black circles represent the input data, gray squares represent class representatives.

Figure 2.19 Black circles represent the input data, x_n; gray squares represent class representatives, normal x overbar Subscript normal chi Baseline period

l left-parenthesis normal x Subscript n Baseline bar normal x Subscript chi Baseline comma sigma squared right-parenthesis equals StartFraction 1 Over left-parenthesis 2 pi right-parenthesis Superscript StartFraction upper M Over 2 EndFraction Baseline sigma EndFraction exp left-parenthesis minus one half StartFraction double-vertical-bar normal x Subscript n Baseline minus normal x Subscript chi Baseline double-vertical-bar squared Over sigma squared EndFraction right-parenthesis period

With our previous definition of u_χ(x), we can express it as

The log likelihood of observing the whole dataset x_n{n = 1, 2, …, N) after removing all constants is We thus see that the goal function of vector quantization J_VQ produces the maximum likelihood estimates of the underlying x_l vectors.

Under this generative model, the probability density function of the observations is the convolution of a Gaussian function and a set of delta functions located at the x_χ vectors, that is, a set of Gaussians located at the x_χ vectors. The vector quantization then is an attempt to find the centers of the Gaussians forming the probability density function of the input data. This idea has been further pursued by Mixture Models, which are a generalization of vector quantization in which, instead of looking only for the means of the Gaussians associated with each class, we also allow each class to have a different covariance matrix ∑_χ and different a priori probability π_χ. The algorithm looks for estimates of all these parameters by Expectation–Maximization, and at the end produces for each input observation x_n, the label χ of the Gaussian that has the

Скачать книгу