Detection of Pneumothorax with Deep Learning Models: Learning From Radiologist Labels vs Natural Language Processing Model Generated Labels

Published:October 12, 2021DOI:

      Rationale and Objectives

      To compare the performance of pneumothorax deep learning detection models trained with radiologist versus natural language processing (NLP) labels on the NIH ChestX-ray14 dataset.

      Materials and Methods

      The ChestX-ray14 dataset consisted of 112,120 frontal chest radiographs with 5302 positive and 106, 818 negative labels for pneumothorax using NLP (dataset A). All 112,120 radiographs were also inspected by 4 radiologists leaving a visually confirmed set of 5,138 positive and 104,751 negative for pneumothorax (dataset B). Datasets A and B were used independently to train 3 convolutional neural network (CNN) architectures (ResNet-50, DenseNet-121 and EfficientNetB3). All models' area under the receiver operating characteristic curve (AUC) were evaluated with the official NIH test set and an external test set of 525 chest radiographs from our emergency department.


      There were significantly higher AUCs on the NIH internal test set for CNN models trained with radiologist vs NLP labels across all architectures. AUCs for the NLP/radiologist-label models were 0.838 (95%CI:0.830, 0.846)/0.881 (95%CI:0.873,0.887) for ResNet-50 (p = 0.034), 0.839 (95%CI:0.831,0.847)/0.880 (95%CI:0.873,0.887) for DenseNet-121, and 0.869 (95%CI: 0.863,0.876)/0.943 (95%CI: 0.939,0.946) for EfficientNetB3 (p ≤0.001). Evaluation with the external test set also showed higher AUCs (p <0.001) for the CNN models trained with radiologist versus NLP labels across all architectures. The AUCs for the NLP/radiologist-label models were 0.686 (95%CI:0.632,0.740)/0.806 (95%CI:0.758,0.854) for ResNet-50, 0.736 (95%CI:0.686, 0.787)/0.871 (95%CI:0.830,0.912) for DenseNet-121, and 0.822 (95%CI: 0.775,0.868)/0.915 (95%CI: 0.882,0.948) for EfficientNetB3.


      We demonstrated improved performance and generalizability of pneumothorax detection deep learning models trained with radiologist labels compared to models trained with NLP labels.

      Key Words


      CNN (Convolutional neural network), NLP (Natural language processing), NIH (National institutes of health), PACS (Picture archiving and communications system), DICOM (Digital imaging and communications in medicine)
      To read this article in full you will need to make a payment

      Purchase one-time access:

      Academic & Personal: 24 hour online accessCorporate R&D Professionals: 24 hour online access
      One-time access price info
      • For academic or personal research use, select 'Academic and Personal'
      • For corporate R&D use, select 'Corporate R&D Professionals'


      Subscribe to Academic Radiology
      Already a print subscriber? Claim online access
      Already an online subscriber? Sign in
      Institutional Access: Sign in to ScienceDirect


        • Taylor AG
        • Mielke C
        • Mongan J
        Automated detection of moderate and large pneumothorax on frontal chest X-rays using deep convolutional neural networks: a retrospective study.
        PLoS Med. 2018; 15e1002697
        • Setio AAA
        • Ciompi F
        • Litjens G
        • et al.
        Pulmonary nodule detection in CT Images: false positive reduction using multi-view convolutional networks.
        IEEE Trans Med Imaging. 2016; 35: 1160-1169
        • Lakhani P
        • Sundaram B
        Deep learning at chest radiography: automated classification of pulmonary tuberculosis by using convolutional neural networks.
        Radiology. 2017; 284: 574-582
      1. Rajpurkar P, Irvin J, Zhu K, et al. CheXNet: radiologist-level pneumonia detection on chest X-rays with deep learning. arXiv:1711.05225v3 [Preprint, cited 2018 May 21]. 2017. Available at:

        • Zech JR
        • Badgeley MA
        • Liu M
        • et al.
        Variable generalization performance of a deep learning model to detect pneumonia in chest radiographs: a cross-sectional study.
        PLoS Med. 2018; 15e1002683
        • Yamashita R
        • Nishio M
        • Do RKG
        • et al.
        Convolutional neural networks: an overview and application in radiology.
        Insights Imaging. 2018; 9: 611-629
        • Montagnon E
        • Cerny M
        • Cadrin-Chênevert A
        • et al.
        Deep learning workflow in radiology: a primer.
        Insights Imaging. 2020; 11: 22
      2. Gooßen A, Deshpande H, Harder T, et al. Deep learning for pneumothorax detection and localization in chest radiographs. arXiv:1907.07324v1 [Preprint, cited 2020 May 16]. 2019. Available at:

      3. Sze-To A, Wang Z. tCheXNet: Detecting Pneumothorax on Chest X-Ray Images Using Deep Transfer Learning. In: Karray F., Campilho A., Yu A. (eds) Image Analysis and Recognition. ICIAR 2019. Lecture Notes in Computer Science, vol 11663. Springer, Cham 2019.

        • Chan YH
        • Zeng YZ
        • Wu HC
        • Wu MC
        • Sun HM
        Effective pneumothorax detection for chest x-ray images using local binary pattern and support vector machine.
        J Healthc Eng. 2018; 2018: 2908517
      4. Wang X, Peng Y, Lu L, Lu Z, Bagheri M, Summers RM (2017) ChestX-ray8: hospital-scale chest X-ray data- base and benchmarks on weakly-supervised classification and localization of common thorax diseases. arXiv:1705.02315v5 [Preprint, cited 2019 Oct 22]. Available at:

      5. Irvin J, Rajpurkar P, Ko M, et al. Chexpert: a large chest radiograph dataset with uncertainty labels and expert comparison. arXiv:1901.07031v1 [Preprint, cited 2020 Apr 20]. 2019. Available at:

        • Johnson AEW
        • Pollard TJ
        • Berkowitz SJ
        • et al.
        MIMIC-CXR, a de-identified publicly available database of chest radiographs with free-text reports.
        Sci Data. 2019; 6: 317
        • Oakden-Rayner L
        Exploring large-scale public medical image datasets.
        Acad Radiol. 2020; 27: 106-112
        • Baltruschat IM
        • Nickisch H
        • Grass M
        • et al.
        Comparison of deep learning approaches for multi-label chest x-ray classification.
        Sci Rep. 2019; 9: 6381
      6. Yao L, Prosky J, Poblenz E, et al. Weakly supervised medical diagnosis and localization from multiple resolutions. arXiv:1803.07703v1 [Preprint, cited 2020 May 07]. 2018. Available at:

      7. Guendel S, Grbic S, Georgescu B, et al.Learning to recognize abnormalities in chest x-rays with location-aware dense networks. arXiv:1803.04565v1 [Preprint, cited 2020 Mar 20]. 2018. Available at:

        • Rajpurkar P
        • Irvin J
        • Ball RL
        • et al.
        Deep learning for chest radiograph diagnosis: a retrospective comparison of the CheXNeXt algorithm to practicing radiologists.
        PLoS Med. 2018; 15e1002686
        • Park S
        • Lee SM
        • Kim N
        • et al.
        Application of deep learning–based computer-aided detection system: detecting pneumothorax on chest radiograph after biopsy.
        Eur Radiol. 2019; 29: 5341-5348
        • Kim DW
        • Jang HY
        • Kim KW
        • et al.
        Design characteristics of studies reporting the performance of artificial intelligence algorithms for diagnostic analysis of medical images: results from recently published papers.
        Korean J Radiol. 2019; 20: 405-410
        • Taylor Jeremy MG
        Choosing the number of controls in a matched case-control study, some sample size, power and efficiency considerations.
        Stat Med. 1986; 5: 29-36
        • MacDuff A
        • Arnold A
        • Harvey J
        Management of spontaneous pneumothorax: British Thoracic Society pleural disease guideline 2010.
        Thorax. 2010; 65: ii18-ii31
        • He K
        • Zhang X
        • Ren S
        • et al.
        Deep residual learning for image recognition.
        in: IEEE Conf. Comput. Vis. Pattern Recognit. 2016: 770-778
        • Huang G
        • Liu Z
        • Maaten Lvd
        • et al.
        Densely connected convolutional networks.
        in: 2017 IEEE Conference on Computer Vision and Pattern Recognition. 2017: 2261-2269
      8. Tan M, Le QV.Efficientnet: Rethinking model scaling for convolutional neural networks. arXiv:1905.11946v3 [Preprint, cited 2020 Mar 20]. 2019. Available at:

        • Zhou B
        • Khosla A
        • Lapedriza A
        • et al.
        Learning Deep Features for Discriminative Localization.
        in: 2016 IEEE Conference on Computer Vision and Pattern Recognition. 2016: 2921-2929
      9. Selvaraju RR, Das A, Vedantam R, et al. Grad-CAM: why did you say that? visual explanations from deep networks via gradient-based localization. arXiv:1610.02391v4 [Preprint, cited 2020 Feb 13]. 2016.Available at:

        • DeLong ER
        • DeLong DM
        • Clarke-Pearson DL
        Comparing the areas under 2 or more correlated receiver operating characteristic curves: a nonparametric approach.
        Biometrics. 1988; 44: 837-845
        • Sabottke CF
        • Spieler BM
        The effect of image resolution on deep learning in radiography.
        Radiology: Artif Intell. 2020; 2e190015