upper W Subscript 2 o Baseline bold-italic h Superscript left-parenthesis t minus 1 right-parenthesis Baseline plus bold-italic b Subscript o Baseline right-parenthesis EndLayout"/>
where and are weight matrix and bias, and is the sigmoid function.
The two hidden states and are calculated by
(13)
(14)
where represents elementwise product between matrices. In Equation (13), the first term multiplies with , controlling what information in the previous cell state can be passed to the current cell state. As for the second term, stores the information passed from and , and controls how much information from the current state is preserved in the cell state. The hidden state depends on the current cell state and , which decides how much information from the current cell state will be passed to the hidden state .
Figure 9 Architecture of long short‐term memory network (LSTM).
In LSTM, if the loss is evaluated at , the gradient w.r.t. calculated via backpropagation can be written as
(15)
where represents other terms in the partial derivative calculation. Since the sigmoid function is used when calculating the values of , this implies that they will be close to either 0 or 1. When is close to 1, the gradient does not vanish, and when it is close to 0, it means that the previous information is not useful for the current state and should be forgotten.
7 Conclusion
We discussed the architectures of four types of neural networks and their extensions in this chapter. There have been many other neural networks proposed in the past years, but the ones discussed in this chapter are the classical ones that served as foundations for many other works. Though DNNs have achieved breakthroughs in many fields, the performances in many fields are far from perfect. Developing new architectures that can improve the performances on various tasks or solve new problems is an important research direction. Analyzing the properties and problems of existing architectures is also of great interest to the community.
References
1 1 Larochelle, H., Bengio,