2.1 Neural networks as emotion readers
Modern research in the field of AI shows that emotions have become not only an object of study but also an important element of interaction between humans and machines. A significant question remains how exactly it is capable of «reading» them through the face, voice, and text.
One of the most studied and applied methods is emotion recognition based on facial expression analysis, allowing a person to identify the state of an individual with accuracy. Facial expressions are one of the most expressive and direct indicators of internal emotional states because they have a tendency to show emotions unconsciously and involuntarily. This makes them highly valuable in real-time interaction where voice or verbal cues are unavailable or unreliable.
Scientifically, facial expressions are based on universal muscle movements, as can be seen in the research of Paul Ekman and Wallace Friesen, who developed a system of facial action encoding. It defines specific units of action that build up into emotional expressions that are perceivable. For example, a genuine smile requires the contraction of both the big zygomatic muscle (pulls mouth corners upwards) and the orbicular eye muscle (creates «crow’s feet» around the eyes). It is difficult to fake these physiological reactions consciously, and therefore, they are very informative for emotion recognition systems.
The working principle relies on the observation of various facial features, such as the shape of the eyes, the angle of the lips, the movement of the eyebrows, and other small changes that may indicate a particular state. These changes can be both visible and not detectable by the normal human eye. For their effective recognition, various algorithms are used, each with its own features and applicability in different contexts.
One of the most widespread tools is the convolutional neural network (CNN) [22]. These neural network algorithms, based on the principles of machine learning, enable deep image analysis by extracting various features that play an important role in recognizing facial expressions (fig. 10).
Figure 10. Architecture CNN
They efficiently process visual data, automatically extracting and classifying features such as contours, textures, eye shapes, and corners of the mouth. At a low level, they detect simple elements, while at a higher level – complex patterns such as the shape of facial expressions, which allows for accurate classification of emotions such as joy, sadness, anger, and others. In addition, CNN are spatially invariant, i.e., they can identify expressions of emotion appropriately even when the face is rotated, partially occluded, or slightly shifting in location. They are therefore appropriate tools for real-world applications such as emotion tracking, adaptive user interfaces, and human-computer interaction systems. To improve recognition accuracy, recurrent neural networks (RNN) and long short-term memory networks (LSTM) are often used, as they enable modeling of temporal dependencies (fig. 11).
Figure 11. Architecture RNN and LSTM
They can analyze the dynamics of facial expressions by tracking their changes over time, which is especially important in real-life situations where emotions can shift during interaction. For example, recognizing them based on video clips requires accounting for temporal aspects, such as changes in facial expressions during a conversation. They are built to take context and event order into account, and thus are particularly well-suited to dynamic data. In addition, multimodal solutions guarantee a much better accuracy of recognition. Emotions are rarely depicted by the face alone, most of the time they are complemented by voice, gestures, body movements, and physiological states. These models integrate, for example, visual data with auditory data processed through speech parameters and create a richer picture of the emotional state. This procedure helps in preventing interpretive errors and accommodating better the user’s individual and cultural features.
In the emotionally adaptive systems, this integration allows it to possess high empathic accuracy, providing not only emotion detection, but also understanding of its context, reasons, and direction of evolution. This allows the system to dynamically adjust behavior, tone, and type of interaction while interacting with the user.
To classify facial expressions and interpret feelings based on such associations, various machine learning algorithms are actively applied. Among them, support vector machines (SVM) hold a special place [23]. They demonstrate high efficiency when processing well-labeled and linearly separable data (fig. 12).
Figure 12. Model SVM for classifying emotional expressions
Their main advantage is the ability to create best separating hyperplanes that maximally discriminate data classes in a multidimensional feature space. It allows you to effectively differentiate between anger, joy, surprise, disgust, and other expressions, even when there are minimal differences in facial patterns among them. The algorithm works by finding a boundary that not only separates classes but also maximizes the distance between the nearest points of different classes, the support vectors. In doing this, it renders the model robust against overfitting, especially for cases where the number of features is small, and the training sample size is limited.
Конец ознакомительного фрагмента.
Текст предоставлен ООО «Литрес».
Прочитайте эту книгу целиком, купив полную легальную версию на Литрес.
Безопасно оплатить книгу можно банковской картой Visa, MasterCard, Maestro, со счета мобильного телефона, с платежного терминала, в салоне МТС или Связной, через PayPal, WebMoney, Яндекс.Деньги, QIWI Кошелек, бонусными картами или другим удобным Вам способом.