Artificial Intelligence and Quantum Computing for Advanced Wireless Networks. Savo G. Glisic. Читать онлайн. Newlib. NEWLIB.NET

Информация о произведении:

Автор:	Savo G. Glisic
Издательство:	John Wiley & Sons Limited
Серия:
Жанр произведения:	Программы
Год издания:	0
isbn:	9781119790310

Скачать книгу

alt="w Subscript italic i j Superscript l"/> denote the weights on the links between neuron i in the previous layer and neuron j in layer l. The output of the j‐th neuron in layer l is represented by the variable a Subscript i Superscript l

. The outputs a Subscript i Superscript upper L

in the last L‐th layer represent the overall outputs of the network. Here, we use notation y_i for the outputs as y Subscript i Baseline equals a Subscript i Superscript upper L

y Subscript i Baseline equals a Subscript i Superscript upper L

. Parameters x_i , defined as inputs to the network, may be viewed as a 0‐th layer with notation x Subscript i Baseline equals a Subscript i Superscript 0

x Subscript i Baseline equals a Subscript i Superscript 0

. These definitions are summarized in Table 3.1.

Table 3.1 Multi‐layer network notation.

	Weight connecting neuron i in layer l − 1 to neuron j in layer l
	Bias weight for neuron j in layer l
	Summing junction for neuron j in layer l
	Activation (output) value for neuron j in layer l
	i‐th external input to network
	i‐th output to network

Define an input vector x = [x₀, x₁, x₂, … x_N] and output vector y = [y₀, y₁, y₂, … y_M]. The network maps, y = N(w, x), the input x to the outputs y using the weights w. Since fixed weights are used, this mapping is static; there are no internal dynamics. Still, this network is a powerful tool for computation.

It has been shown that with two or more layers and a sufficient number of internal neurons, any uniformly continuous function can be represented with acceptable accuracy. The performance rests on the ways in which this “universal function approximator” is utilized.

3.1.2 Weights Optimization

The specific mapping with a network is obtained by an appropriate choice of weight values. Optimizing a set of weights is referred to as network training. An example of supervised learning scheme is shown in Figure 3.3. A training set of input vectors associated with the desired output vector, {(x₁, d₁), … (x_P, d_P)}, is provided. The difference between the desired output and the actual output of the network, for a given input sequence x, is defined as the error

(3.3) e equals d minus y period

The overall objective function to be minimized over the training set is the given squared error

(3.4) upper J equals sigma-summation Underscript p equals 1 Overscript upper P Endscripts e Subscript p Superscript upper T Baseline e Subscript p Baseline period

The training should find the set of weights w that minimizes the cost J subject to the constraint of the network topology. We see that training a neural network represent a standard optimization problem.

A stochastic gradient descent (SGD) algorithm is an option as an optimization method. For each sample from the training set, the weights are adapted as

(3.5) normal upper Delta normal w equals minus mu ModifyingAbove nabla With ampersand c period circ semicolon Subscript prime

where ModifyingAbove italic nabla With ampersand c period circ semicolon equals partial-differential normal e Superscript upper T Baseline normal e slash partial-differential w is the error gradient for the current input pattern, and μ is the learning rate.

Backpropagation: This is a standard way to find partial-differential normal e Superscript upper T Baseline normal e slash partial-differential w Subscript italic i j Superscript l in Eq. (3.5). Here we provide a formal derivation.

Single neuron case – Consider first a single linear neuron, which we may describe compactly as

(3.6) y equals sigma-summation Underscript i equals 0 Overscript upper N Endscripts w Subscript i Baseline x Subscript i Baseline equals normal w Superscript upper T Baseline normal x comma

where w = [w₀, w₁, … w_N] and x = [1, x₁, … x_N]. In this simple setup

Schematic illustration of supervised learning.

Figure 3.3 Schematic representation of supervised learning.

(3.7) equation

so that Δw = 2μex. From this, we have Δw_i = 2μex_i , which is the least mean square (LMS) algorithm.

In a multi‐layer network, we just formally extend this procedure. For this we use the chain rule

(3.8) StartFraction partial-differential left-parenthesis normal

<div class= Скачать книгу