Considering popularity of deep learning, four different models of AlexNet [34] and GoogleNet [65] are applied for thoracic image analysis wherein two of them are trained from ImageNet and two are trained from scratch. Then, these models are used for detecting TB from CXR radiography images. Parameters of AlexNet-T and GoogleNet-T are initialized from ImageNet, whereas AlexNet-U and GoogleNet-U parameters are trained from scratch. The performance of all four models are compared and it is observed that trained versions are having better accuracy than the untrained versions [35].
In another model, focus was given only on eight pathologies of thoracic diseases [70]. Weakly supervised DCNN is applied for large set of images which might have more than one pathology in same image. The pre-trained model is adopted on ImageNet by excluding fully connected and final classification layer. In place of these layers, a transition layer, a global pooling layer, a prediction layer, and a loss layer are inserted in the end after last convolution layer. Weights are obtained from the pre-trained models except transition, and prediction layers were trained from scratch. These two layers help in finding plausible location of disease. Also, instead of conventional softmax function, three different loss functions are utilized, namely, Hinge loss, Euclidean loss, and Cross Entropy loss due to disproportion of number of images having pathologies and without pathology. Global pooling layer and prediction layer help in generating heatmap to map presence of pathology with maximum probability. Moreover, Cardiomegaly and Pneumothorax have been well recognized using the model based on ResNet50 [21] as compared to other pathologies.
In [28], three different datasets, namely, Indiana, JSRT, and Shenzhen dataset, were utilized for the experimentation of proposed deep model. Indiana dataset consists of 7,284 CXR images of both frontal and lateral region of chest annotated for pathologies Cardiomegaly, Pulmonary Edema, Opacity, and Effusion. JSRT consists of 247 CXR having 154 lung nodule and 94 with no nodule. Shenzhen dataset consists of 662 frontal CXR images with 336 TB cases and remaining normal cases. Features of one of the layers from pre-defined models are extracted and used with binary classifier layer to detect abnormality and features are extracted from second fully connected layer in AlexNet, VGG16, and VGG19 network. It is observed that, dropout benefits shallow networks in terms of accuracy but it hampers the performance of deeper networks. Shallow DCN are generally used for detecting small objects in the image. Ensemble models perform better for spatially spread out abnormalities such as Cardiomegaly and Pulmonary Edema, whereas pointed small features like nodules cannot be easily located through ensemble models.
Subsequently, three branch attention guided CNN (AG-CNN) is proposed based on the two facts. First fact is that though the thoracic pathologies are located in a small region, complete CXR image is given as an input for training which add irrelevant noise in the network. Second fact is that the irregular border arises due to poor alignment of CXR, obstruct the performance of network [19]. ResNet50 and DenseNet121 have been used as backbone for two different version of AG-CNN in which global CNN uses complete image and a mask is created to crop disease specific region from the generated heat map of global CNN. The local CNN is then trained on disease specific part of the image and last pooling layers of both the CNNs are concatenated to fine tune the amalgamated branch. For classifying chest pathologies, conventional and deep learning approaches are used and are compared on the basis of error rate, accuracy, and training time [2]. Conventional models include Back Propagation Neural Network (BPNN) and Competitive Neural Network (CpNN) and deep learning model includes simple CNN. Deep CNN has better generalization ability than BPNN and CpNN but requires more iteration due to extraction of features at different layers.
A pre-defined CNN for binary classification of chest radiographs which assess their ability on live customized dataset obtained from U.S. National Institutes of Health is presented in [18]. Before applying deep learning models, the dataset is separated into different categories and labeled manually with two different radiologist. Their labels are tallied and conflicting images are discarded. Normal images without any pathology were removed and 200,000 images were finally used for training purpose. Out of those images, models were trained on different number of images and performance of models noted in terms of AUC score. It is observed that modestly size images achieve better accuracy for binary classification into normal and abnormal chest radiograph. This automated image analysis will be useful in poor resource areas.
The CheXNet deep learning algorithm is used to detect 14 pathologies in chest radio-graphs where the 121-layer DenseNet architecture is densely connected [49]. Ensemble network is generated by allowing multiple network to get trained on training set and networks which has less average prediction error are selected to become the part of ensemble network. The parameters of each ensemble network are initialized using the ImageNet pretrained network. The image input size is 512 × 512 and the optimization of Adams was used to train the NN parameter with batch size of 8 and learning rate of 0.0001. To prevent dropouts and decay, network was saved after every epoch. To deal with overfitting, early stopping of iteration was done.
Considering the severity of TB which is classified as the fifth leading cause of death worldwide, with 10 million new cases and 1.5 million deaths per year, DL models are proposed to detect it from CXR. Being one of the world’s biggest threats and being rather easy to cure, the World Health Organization (WHO) recommends systematic and broad use of screening to extirpate the disease. Posteroanterior chest radiography, in spite its low specificity and difficulty in interpretation, is still unfortunately one of the preferred TB screening methods. Since TB is primarily a disease of poor countries, the clinical officers trained to interpret these CXRs are often rare in number. In such circumstances, an automated algorithm for TB diagnosis could be an inexpensive and effective method to make widespread TB screening a reality. As a consequence, this has attracted the attention of the machine learning community [9, 27, 28, 30, 33, 35, 38, 40 , 42, 68] which has tackled the problem with methods ranging from hand-crafted algorithm to support vector machines and convolutional neural networks. Considering the rank of TB in the list of cause of death worldwide, deep learning models are implemented for fast screening of TB [46]. The results are encouraging, as some of these methods achieve nearly-human sensitivities and specificities. Considering the limitation of availability of powerful and costly hardware and large number learning parameters, a simple Deep CNN model has been proposed for CXR TB screening rather than using complex machine learning pipelining as used in [30, 40, 42, 68]. The saliency maps and the grad-CAMs have been used for the first time to provide better visualization effects. As radiologist is having deeper perspective of the chest abnormalities, this model is helpful in providing second opinion to them. The architecture of model consists of five blocks of convolution followed by global average pooling layer and fully connected softmax layer. In between each convolutional block, a max pooling layer is inserted moreover, the overall arrangement is similar to AlexNet. Batch normalization is used by each convolution layer to avoid problem of overfitting. After training of network, silency-maps and grad-CAM are used for better visualization. Silency-maps help generating heat map with same resolution as input image and grad-CAM helps in better localization with poor resolution due to pooling. NIH Tuberculosis Chest X-ray dataset [29] and Belarus Tuberculosis portal dataset [6] are used for experimentation. It is observed that model facilitates better visualization of presence or absence of TB for clinical practitioners. Subsequently, by considering the severity of Pneumonia, a novel model which is ensemble of two models RetinaNet and Mask R-CNN is proposed in [61] and is tested on Kaggle pneumonia detection competition dataset consisting of 26,684 images. Transfer learning is applied for weight initialization from models trained on Microsoft COCO challenge. To detect the object, RetinaNet is utilized first and then Mask R-CNN is employed as a supplementary model. Both these models are allowed to individually predict pneumonia region. If bounding box of predicted region from both models overlapped then averaged was taken on the basis of weight ratio 3:1, otherwise it was used in the dataset without any change for detection by ensemble model. In addition, Recall score is obtained by the ensemble model is 0.734.
A