5 Sensitivity and Specificity: These metrics are generally used for medical application which are calculated as follows.
6 ROC Curve: It is a Receiver Operating Characteristic curve generally used to measure the performance of binary classifier model. This curve is plotted as True-Positive Rate against False-Positive Rate. It depicts overall performance of model and helps in selecting good cut-off threshold for the model.
7 AUC Curve: Area Under Curve (AUC) is a binary classifier’s aggregated output metric for all possible threshold values (and therefore it is threshold invariant). Under the ROC curve, AUC calculates the field, and hence, it is between 0 and 1. One way to view AUC is as the likelihood that a random positive example is ranked more highly by the model than a random negative example.
2.2.3 Availability of Datasets
Mostly, deep learning models used for identification of chest radiography pathologies and training processes are carried out on the basis of available CXR datasets, of which the most famous datasets are the Indiana dataset [15], KIT dataset [54], MC dataset [29], Japanese Society of Radiological Technology (JSRT) dataset [59], ChestX-ray14 dataset [19], NIH Tuberculosis Chest X-ray [17], and Belarus Tuberculosis [6]. There are major limitations to each cited dataset, some of which are addressed in a survey published in August 2018 [48]. The ChestX-ray14 is recognized as one of the most widespread CXR datasets among the available datasets, which contains 108,948 x-ray images obtained from 32,717 patients. These images are labeled by means of natural language processing with one or more diagnostic labels. A number of recent AI reports, such as Wang et al. [70], Yao et al. [71], Rajpurkar et al. [49], and Guan et al. [4], have used ChestX-ray14 dataset. All of these studies are trained and tested on ChestX-ray14 dataset accompanied with annotation for 14 different types of chest pathologies. Details of availability of number of each type of pathology in ChestX-ray14 dataset are shown in Table 2.1 [7].
It is observed that there is a presence of disproportion in the number of available images among 14 chest pathologies. This is one of the factors affecting performance of different deep models. Before analyzing existing models, 14 chest pathologies are described as follows in Figure 2.2.
1 Atelectasis: It is a disorder where there is no space for normal expansion of lung due to malfunctioning of air sacs in it.
2 Cardiomegaly: It is a disorder related to heart where heart enlarged due to stress or some medical condition.
3 Consolidation: When the small airways in lungs are filled with fluids like pus, water, or blood instead of air, then consolidation occurs.
4 Edema: It occurs due to deposition of excess fluid in lungs.
5 Effusion: In this disorder excess fluid filled in between chest wall and lungs.
6 Emphysema: Alveoli which are known as air sacs of lungs when damaged or get weak then person suffers with Emphysema.
7 Fibrosis: When lung tissues get thickened or stiff, then it becomes difficult for lungs to work normally. This condition is known as fibrosis.
8 Hernia: protuberance of thoracic contents outside their defined location in thorax region is known as thoracic hernia.
9 Infiltration: When there is a trail of denser substance such as pus, blood, or protein occurs within the parenchyma of the lungs, then it is known as a pulmonary infiltration.
10 Mass: It is a tumor that grows in mediastinum region of chest that separates the lungs is termed as Mass.
11 Nodule: A small masses of tissue in the lung are known as lung nodules.
12 Pleural Thickening: When the lung is exposed to asbestos, it causes lungs tissue to scar. This condition is known as pleural thickening.
13 Pneumonia: When there is an infection in air sacs of either or both lungs, then its results in Pneumonia.
14 Pneumothorax: When air leaks from lungs into the chest wall then this condition is known as Pneumothorax disorder.
Table 2.1 Details of ChestX-ray14 dataset.
Type of pathology | No. of images with label | Type of pathology | No. of images with label |
---|---|---|---|
Atelectasis | 11559 | Consolidation | 4,667 |
Cardiomegaly | 2776 | Edema | 2,303 |
Effusion | 13317 | Emphysema | 2,516 |
Infiltration | 19894 | Fibrosis | 1,686 |
Mass | 5782 | Pleural thickening | 3,385 |
Nodule | 6331 | Hernia | 227 |
Pneumonia | 1431 | Normal chest x-ray | 60,412 |
Pneumothorax | 5302 |
Figure 2.2 Types of chest pathologies.
Detection of Cardiomegaly is done by many researchers as it is a spatially spread disorder across large region and therefore easy to detect.
2.3 Existing Models
Models proposed in the past are mainly classified into two types: ensemble models and hybrid and pretrained models. Ensemble models either focused on classifying all fourteen pathologies or limited abnormalities like cardiomegaly, Edema, Pneumonia, or COVID-19. In pretrained models, initialization of parameters of deep learning models is done from ImageNet dataset, and then, the network is fine-tuned as per the pathologies targeted. This section deals with discussion on various existing models implemented in the literature along with issues they have addressed related to x-ray images, datasets used for training, and the type of pathologies detected by the model in chronological order of their implementation.
In [4], the deep learning model named Decaf trained on non-medical ImageNet dataset for detection of pathologies in medical CXR dataset is applied. Image is considered as Bag of Visual Words (BoVW). The model is created using CNN, GIST descriptor, and BoVW for feature extraction on ImageNet dataset and then it was applied for feature extraction from medical images. Once the model is trained, SVM is utilized for pathology classification of CXR and the AUC is obtained in the range of 0.87 to 0.97. The results of feature extraction can be further improved by using fusion of Decafs model such as Decaf5, Decaf6, and GIST is presented