A multi-attention framework to deal with issues like class imbalance, shortage of annotated images, and diversity of lesion areas is developed in [41] and ChestX-ray14 dataset is used for experimental purpose. Three modules which are implemented by the authors are feature attention module, space attention module, and hard example attention module. In feature attention module, interdependencies of pathologies are detected considering structure of ResNet101 model as base. Because of the ability of Squeeze and Excitation (SE) block [35] to model channel interdependencies of modules, one SE block is inserted into each ResNet block. The feature map generated by this module contains lots of noise and is learnt from global information rather than concentrating on small diseases related region. To avoid this, space attention module is introduced. In this module, global average pooling is applied on feature map obtained from ResNet101 [39]. This help in carrying out global information of image in each pixel which benefits classification and localization task. In hard attention modules, positive and negative images are separated into two different sets and model is trained on these individual sets to obtain threshold value of predicted score for each set. Then, set C is created which is combination of both sets and contained increased proportions of positive samples. The models is retrained on set C to distinguish 14 classes of thoracic diseases. This helps in resolving issue of presence of large gap in positive and negative samples.
Multiple feature extraction technique was used by author in paper [23] for the classification of thoracic pathologies. Various classifiers such as Gaussian discriminant analysis (GDA), KNN, Naïve Bayes, SVM, Adaptive Boosting (AdaBoost), Random forest, and ELM were compared with pretrained DenseNet121 which was used for localization by generating CAM (Class Activation Map) and integrated results of different shallow and deep feature extraction algorithms such as Scale Invariant Feature Transform (SIFT), Gradient-based (GIST), Local Binary Pattern (LBP), and Histogram Oriented Gradient–based (HOG) with different classifiers have been used for final classification of various lung abnormalities. It is observed that ELM is having better F1-score than the DenseNet121.
Two asymmetric networks ResNet and DenseNet which extract complementary unique features from input image were used to design new ensemble model known as DualCheXNet [10]. It has been the first attempt to use complementarity of dual asymmetric subnetworks developed in the field of thoracic disease classification. Two networks, i.e., ResNet and DenseNet are allowed to work simultaneously in Feature Level Fusion (FLF) module and selected features from both networks are combined in Decision Level fusion (DLF) on which two auxiliary classifiers are applied for classifying image into one of the pathologies.
The problem of poor alignment and noise in non-lesion area of CXR images which hinders the performance of network is overcome by building three branch attention guided CNN which is discussed in [20]. It helps to identify thorax diseases. Here, AGCNN is explored which works in the same manner as radiologist wherein ResNet50 is the backbone of AGCNN. Radiologist first browse the complete image and then gradually narrows down the focus on small lesion specific region. AGCNN mainly focus on small local region which is disease specific such as in case of Nodule. AGCNN has three branches local branch, global branch, and fusion branch. If lesion region is distributed throughout the image, then the pathologies which were missed by local branch in terms of loss of information such as in case of pneumonia were captured by global branch. Global and local branches are then fuse together to fine tune the CNN before drawing final conclusion. The training of AGCNN is done in different training orders. G_LF (Global branch then Local and Fusion together), GL_F (Global and Local together followed by Fusion), GLF all together, and then G_L_F (Global, Local and Fusion separately) one after another.
Lack of availability of annotated images majorly hinders the performance of deep learning model designed for localization or segmentation [53]. To deal with this issue, a novel loss function is proposed and the conditional random field layer is included in the backbone model of ResNet50 [22] whose last two layers are excluded and weights initialized on ImageNet have been used. In order to make CNN shift invariant, a low pass antialiasing filter as proposed by [73] is inserted prior to down sampling of network. This supports in achieving better accuracy across many models. NIH ChestX-ray14 has been used by the author which have very limited annotated images. Only 984 images with bounding boxes are used for detecting 8 chest pathologies and 11,240 images are having only labels associated with them. Furthermore, chest x-ray dataset is investigated which has many images with uncertain labels. To dispense this issue, a label smoothing regularization [44, 66] is adopted in the ensemble models proposed in [47] which performs averaging of output generated by the pre-trained models, i.e., DenseNet-121, DenseNet-169, DenseNet-201 [25], Inception-ResNet-v2 [64], Xception [12], and NASNetLarge [74]. Instead of ReLU, sigmoid function is utilized as an activation. In addition, label smoothing is applied on uncertain sample images which helped in improving AUC score.
A multiple instance learning (MEL) assures good performance of localization and multi-classification albeit in case of availability of less number of annotated images is discussed in [37]. Latest version of residual network pre-act-ResNet [22] has been employed to correctly locate site of disease. Initially, model is allowed to learn information of all images, namely, class and location. Later, input annotated image is divided into four patches and model is allowed to train for each patch. The learning task becomes a completely supervised problem for an image with bounding box annotation, since the disease mark for each patch can be calculated by the overlap between the patch and the bounding box. The task is formulated as a multiple-instance learning (MIL) problem where at least one patch in the image belongs to that disease. All patches have to be disease-free if there is no illness in the picture.
Considering orientation, rotation and tilting problems of images, hybrid deep learning framework, i.e., VDSNet by combining VGG, data augmentation, and spatial transformer network (STN) with CNN for detection of lung diseases such as asthma, TB, and pneumonia from NIH CXR dataset is presented in [7]. The comparison is performed with CapsNet, vanilla RGB, vanilla gray, and hybrid CNN VGG and result shows that the VDSNet achieved better accuracy of 73% than other models but is time consuming. In [67], a technique of using predefined deep CNN, namely, AlexNet, VGG16, ResNet18, Inception-v3, DenseNet121 with weights either initialized from ImageNet dataset or initialized with random values from scratch is adopted for classification of chest radiographs into normal and abnormal class. Pretrained weights of ImageNet performed better than random initialized weights from scratch. Deeper CNN works better for detection or segmentation kind of task rather than binary classification. ResNet outperformed training from scratch for moderate sized dataset (example, 8,500 rather than 18,000).
A customized U-NET–based CNN model is developed in [8] for the detection and localization of cardiomegaly which is one of the 14 pathologies of thorax region. To perform the experimentation ChestX-ray8 database was used which consist of 1,010 images of cardiomegaly. Modified (Low Contrast) Adaptive Histogram Equalization (LC-AHE) was applied to enhance the feature of image or sharpen the image. Brightness of low intensity pixel of small selected region is amplified from the intensities of all neighbouring pixels which sharpens the low intensity regions of given image. Considering the medical fact that the Cardiomegaly can be easily located just by observing significant thickening of cardiac ventricular walls, authors developed their own customized mask to locate it and separated out that infected region as image. This helped in achieving an accuracy of 93% which is better than VGG16, VGG19, and ResNet models.
Thoracic