Most contributions in fault diagnosis rely on the analytical redundancy principle. The basic idea consists of using an accurate model of the system to mimic the real process behavior. If a fault occurs, the residual signal (i.e. the difference between the real system and model behavior) can be used to diagnose and isolate the malfunction.
Model-based method reliability, which also includes false alarm rejection, is strictly related to the “quality” of the model and measurements exploited for fault diagnosis, as model uncertainty and noisy data can prevent an effective application of analytical redundancy methods.
This is not a simple problem, because model-based fault diagnosis methods are designed to detect any discrepancy between the real system and model behaviors. It is assumed that this discrepancy signal is related to (has a response from) a fault. However, the same difference signal can respond to model mismatch or noise in real measurements, which are erroneously detected as a fault. These considerations have led to research in the field of “robust” methods, in which particular attention is paid to the discrimination between actual faults and errors due to model mismatch.
However, the availability of a “good” model of the monitored system can significantly improve the performance of diagnostic tools, minimizing the probability of false alarms.
This book explains what a good model is, one that is suitable for robust diagnosis of system performance and operation. The book also explains how robust models can be obtained from real data. A large amount of attention is paid to the “real system modeling problem”, with reference to either linear or nonlinear model structures. Special treatment is given to the case in which noise affects the acquired data. The mathematical description of the monitored system is obtained by means of a system identification scheme based on equation error and errors-in-variables models. This is an identification approach that leads to a reliable model of the plant under investigation, as well as the estimation of the variances of the input–output noises affecting the data.
The purpose of this book is also to provide guidelines for the modeling and identification of real processes for fault diagnosis and fault-tolerant control (FTC). Hence, significant attention is paid to the practical application of the methods describing real system studies, as reported in the last chapters of Volume 2.
In particular, this introduction of the book outlines a new common terminology in the fault diagnosis framework and provides some discussion and a summary of developments in the field of fault detection and diagnosis as well as FTC based on papers selected during 1991–2020.
I.2. Nomenclature
By going through the literature, one immediately recognizes that the terminology in this field is not consistent. This makes it difficult to understand the goals of the contributions and to compare the different approaches.
The IFAC SAFEPROCESS Technical Committee therefore discussed this matter and tried to find commonly accepted definitions. Some basic definitions can be found, for example, in the RAM (Reliability, Availability and Maintainability) dictionary (Omdahl 1988) and in contributions to the IFIP (International Federation for Information Processing) (IFI 1983).
Some of the terminology used in this book is given below. These are based on information obtained from the IFAC SAFEPROCESS Technical Committee and are considered “on-going” in the sense that new definitions and updates are still being made.
1 1) States and signals- Fault: an unpermitted deviation of at least one characteristic property or parameter of the system from the acceptable, usual or standard condition.- Failure: a permanent interruption of a system’s ability to perform a required function under specified operating conditions.- Malfunction: an intermittent irregularity in the fulfillment of a system’s desired function.- Error: a deviation between a measured or computed value of an output variable and its true or theoretically correct value.- Disturbance: an unknown and uncontrolled input acting on a system.- Residual: a fault indicator based on a deviation between measurements and model-equation-based computations.- Symptom: a change of an observable quantity from normal behavior.
2 2) Functions- Fault detection: determination of faults present in a system and the time of detection.- Fault isolation: determination of the kind, location and time of detection of a fault. It follows fault detection.- Fault identification: determination of the size and time-variant behavior of a fault. It follows fault isolation.- Fault diagnosis: determination of the kind, size, location and time of detection of a fault. It follows that fault diagnosis includes fault detection and identification.- Monitoring: a continuous real-time task of determining the conditions of a physical system by recording information, recognizing and indicating anomalies in the behavior.- Supervision: monitoring a physical system and taking appropriate actions to maintain the operation in the case of a fault.
3 3) Models- Quantitative model: use of static and dynamic relationships among system variables and parameters in order to describe a system’s behavior in quantitative mathematical terms.- Qualitative model: use of static and dynamic relationships among system variables in order to describe a system’s behavior in qualitative terms such as causalities and IF–THEN rules.- Diagnostic model: a set of static or dynamic relationships that link specific input variables, the symptoms, to specific output variables, the faults.- Analytical redundancy: use of more (not necessarily identical) ways to determine a variable, where one way uses a mathematical process model in an analytical form.
4 4) System properties- Reliability: ability of a system to perform a required function under stated conditions, within a given scope, during a given period of time.- Safety: ability of a system to operate without causing danger to persons, equipment or the environment.- Availability: probability that a system or equipment will operate satisfactorily and effectively at any point of time.
5 5) Time dependency of faults- Abrupt fault: fault modeled as step-wise function. It represents bias in the monitored signal.- Incipient fault: fault modeled by using ramp signals. It represents drift of the monitored signal.- Intermittent fault: combination of impulses with different amplitudes.
6 6) Fault terminology- Additive fault: it influences a variable by an addition of the fault itself. It may represent, for example, offsets of sensors.- Multiplicative fault: it is represented by the product of a variable with the fault itself. It can appear as parameter changes within a process.
I.3. Fault diagnosis methods based on analytical redundancy
A traditional approach to fault diagnosis in the wider application context is based on hardware or physical redundancy methods, which use multiple sensors, actuators and components to measure and control a particular variable. Typically, a voting technique is applied to the hardware redundant system to decide if a fault has occurred and its location among all the redundant system components. The major problems encountered with hardware redundancy are the extra equipment and maintenance costs, as well as the additional space required to accommodate the equipment (Isermann 1997; Isermann and Ballé 1997).
In view of the conflict between reliability and the cost of adding more hardware, it is possible to use the dissimilar measured values together to cross-compare with each other rather than replicating each hardware individually. This is the meaning of analytical or functional redundancy. It exploits redundant analytical relationships among various measured variables of the monitored process (Patton et al. 1989; Chen and Patton 1999). Figure I.1 illustrates the concepts of hardware and analytical redundancy.
In the analytical redundancy scheme, the resulting difference generated from the comparison of different variables is called a residual or symptom signal. The residual should be zero when the system is in normal operation and should be different from zero when a fault has occurred. This property of the residual is used to determine whether faults have occurred (Patton et al. 1989; Chen and Patton 1999).
Consistency checking in analytical redundancy is normally achieved through a comparison between a measured signal and estimated