The overall outline of the chapter is presented as follows: Section 2.2 consists of broad view of existing research employed by various researchers, challenges in detecting thoracic pathology, datasets used by researchers, thoracic pathologies, and parameters used for comparison of models. Section 2.3 details the models implemented by researchers using deep learning and demonstrates the comparison of models on various parameters. Section 2.4 present the conclusion and future directions.
2.2 Broad Overview of Research
Though ample number of models has been implemented for localization and classification of thoracic pathologies, the broad view of research carried out till date (i.e., December 2020) is represented using Figure 2.1.
It is observed that researchers have either designed their own ensemble models using existing pretrained models or use pre-trained models like GoogleNet, AlexNet, ResNet, and DenseNet for localization of yCXR. In ensemble models, either weights of neural networks are trained on widely available ImageNet dataset or averaging outputs of existing models were used for building new models. Input to these models could be either radiography CXR images or radiologist report in text form. One of the most regular and cost-effective medical imaging tests available is chest x-ray exams. Chest x-ray clinical diagnosis is however more complex and difficult than chest CT imaging diagnosis. The lack of publicly accessible databases with annotations makes clinically valid computer-aided diagnosis of chest pathologies more difficult but not impossible. Therefore, annotation of CXR images is done manually by researchers and is made available publicly to test the accuracy of novel models. These datasets include ChestX-ray14, ChestX-ray8, Indiana, JSRT, and Shenzhen dataset. The final output of existing models is identifying one or more than one pathology out of fourteen located in either x-ray image or radiologist report. DL models devised by researchers are compared on the basis of various metric such as Receiver Operating Characteristic (ROC), Area Under Curve (AUC), and F1-score, which are listed in Figure 2.1. While designing deep learning models for detection of thoracic pathologies various challenges are faced by researchers which are discussed in next section.
2.2.1 Challenges
There are various challenges while implementing deep learning models for analysis of thoracic images and these challenges are listed below. They could be in terms of availability of dataset and nature of images.
Figure 2.1 Broad view of existing research.
1 Non-availability of large number of labeled dataset [16].
2 Due to use of weights of neural nets trained on ImageNet dataset for ensemble model, overfitting problem occurs. Theses pre-trained models are computationally intensive and are less likely to generalize for comparatively smaller dataset [7].
3 Presence of lesions at different location in x-ray and varying proportion of images of each pathology might hinders performance of models [41].
4 X-ray images contain large amount of noise (healthy region) surrounding to lesion area which is very small. These large numbers of safe regions make it impossible for deep networks to concentrate on the lesion region of chest pathology, and the location of disease regions is often unpredictable. This problem is somewhat different from the classification of generic images [67], where the object of interest is normally located in the center of the image.
5 Due to the broad inter-class similarity of chest x-ray images, it is difficult for deep networks to capture the subtle differences in the entire images of different classes, particularly in one entire image.
6 CXR images suffer from distortion or misalignment due to variations in the capture state, e.g., the patient’s pose or the small size of the child’s body.
7 Sometimes, recapturing of images is not possible due to which computer-aided system has to do prediction on the basis of existing images only. Therefore, model should be vigorous to quality of x-ray image.
8 For localization of pathologies, images should be labeled or annotated with bounding boxes and for segmentation pixel-level mask is needed in the image. Both these processes are expensive and time consuming process.
9 Output of CNN is not shift invariant meaning that if input image is shifted then output of CNN also shifted which should not happen in the domain of medical imaging.
10 Imbalance between availability of number of normal class images and images with disorders and disproportion of availability of annotated and unannotated images.
11 Most of the models are trained for ImageNet data set which has altogether different features as compared to medical images [5].
12 Confounding variables in the x-ray images ruin the performance of models therefore there is a need of pre-processing [72].
To overcome these issues, following strategies can be deployed.
1 2-D or 3-D image patches are taken as an input rather than full sized image [11, 14, 36, 50, 56–58, 63].
2 Annotate existing images using affine transformation and expand existing dataset by adding annotated image with existing images, and then, network will be trained from scratch on expanded dataset [11, 50, 56, 57].
3 Use of deep models trained on off-the shelf dataset for feature extraction and then use final classifier at last layer for classification by fine tuning the model with target dataset [14, 36, 58].
4 Initialize parameters of new models with value of parameters of pre-trained models applied on ImageNet dataset [60] and then fine tune the network with task specific dataset.
2.2.2 Performance Measuring Parameters
There are various parameters used by researchers to evaluate performance of their models, namely, ROC, AUC, F1-score, recall, accuracy, specificity, and sensitivity. Use of metric varies from application to application, and sometimes, single parameter does not justify the working of models. In such cases, subset of parameters are used to evaluate performance of model. For testing the accuracy of classification models, accuracy, precision, recall, F1-score, sensitivity, specificity, ROC curve, and AUC curve are utilized on the basis of which comparison is performed in the existing research. These parameters are discussed as follows.
1 Accuracy: It is defined as the ratio of number of correctly classified chest pathologies to the number of available samples. If the dataset has 1,000 images having some pathology or no-pathology and the model has correctly classified 820 pathologies and 10 as normal cases, then accuracy will be 830×100/1,000 = 83%.
2 Precision: When there is imbalance in dataset with respect to availability of class-wise images, then accuracy is not an acceptable parameter. In such cases, model is tuned to major class only which does not make sense. For example, if in CXR dataset, images belonging to class nodule are more, then predicting maximum images of most frequent class will not solve the purpose. Therefore, class specific metric is needed known as precision.
3 Recall: It is also a class specific parameter. It is a fraction of images correctly classified for given class.
4 F1-Score: For critical application, F1-score is needed which will guide which parameter is more appropriate, i.e., precision