Artificial Intelligence and Quantum Computing for Advanced Wireless Networks. Savo G. Glisic. Читать онлайн. Newlib. NEWLIB.NET

Автор: Savo G. Glisic
Издательство: John Wiley & Sons Limited
Серия:
Жанр произведения: Программы
Год издания: 0
isbn: 9781119790310
Скачать книгу
and yout , but this is confusing because y generally stands for output in the machine learning literature. In the interests of clarity, we break with this convention and use i, f , and o to refer to input, forget, and output gates, respectively.

      Computation in the LSTM model is presented by the following equations, performed at each time step. These equations give the full algorithm for a modern LSTM with forget gates:

      In this architecture, there are two layers of hidden nodes. Both hidden layers are connected to input and output. Only the first layer has recurrent connections from the past time steps, while in the second layer, the direction of recurrent connections is flipped, passing activation backward along the sequence. Given an input sequence and a target sequence, the BRNN can be trained by ordinary backpropagation after unfolding across time. The following three equations describe a BRNN:

      (3.73)StartLayout 1st Row h Superscript left-parenthesis t right-parenthesis Baseline equals sigma left-parenthesis upper W Superscript h x Baseline x Superscript left-parenthesis t right-parenthesis Baseline plus upper W Superscript h h Baseline h Superscript left-parenthesis t minus 1 right-parenthesis Baseline plus b Subscript h Baseline right-parenthesis 2nd Row z Superscript left-parenthesis t right-parenthesis Baseline equals sigma left-parenthesis upper W Superscript z x Baseline x Superscript left-parenthesis t right-parenthesis Baseline plus upper W Superscript z z Baseline z Superscript left-parenthesis t plus 1 right-parenthesis Baseline plus b Subscript z Baseline right-parenthesis 3rd Row ModifyingAbove y With ampersand c period circ semicolon Superscript left-parenthesis t right-parenthesis Baseline equals italic soft italic max upper W Superscript y h Baseline h Superscript left-parenthesis t right-parenthesis Baseline plus upper W Superscript y z Baseline z Superscript left-parenthesis t right-parenthesis Baseline plus b Subscript y Baseline EndLayout

      where h(t) and z(t) are the values of the hidden layers in the forward and backward directions, respectively.

      NTMs: The NTM extends RNNs with an addressable external memory [12]. This enables RNNs to perform complex algorithmic tasks such as sorting. This is inspired by the theories in cognitive science that suggest humans possess a “central executive” that interacts with a memory buffer [13]. By analogy with a Turing machine, in which a program directs read heads and write heads to interact with external memory in the form of a tape, the model is called an NTM.

Schematic illustration of a bidirectional recurrent neural network (BRNN).

      In [12], five algorithmic tasks are used to test the performance of the NTM model. By algorithmic we mean that for each task, the target output for a given input can be calculated by following a simple program, as might be easily implemented in any universal programming language. One example is the copy task, where the input is a sequence of fixed length binary vectors followed by a delimiter symbol. The target output is a copy of the input sequence. In another task, priority sort, an input consists of a sequence of binary vectors together with a distinct scalar priority value for each vector. The target output is the sequence of vectors sorted by priority. The experiments test whether an NTM can be trained via supervised learning to implement these common algorithms correctly and efficiently. Interestingly, solutions found in this way generalize reasonably well to inputs longer than those presented in the training set. By contrast, the LSTM without external memory does not generalize well to longer inputs. The authors compare three different architectures, namely an LSTM RNN, and NTM, with a feedforward controller, and an NTM with an LSTM controller. On each task, both NTM architectures significantly outperform the LSTM RNN both in training set performance and in generalization to test data.