Advertisement

Comparison of Feature Selection Methods and Machine Learning Classifiers for Predicting Chronic Obstructive Pulmonary Disease Using Texture-Based CT Lung Radiomic Features

Published:August 11, 2022DOI:https://doi.org/10.1016/j.acra.2022.07.016

      Rationale

      Texture-based radiomics analysis of lung computed tomography (CT) images has been shown to predict chronic obstructive pulmonary disease (COPD) status using machine learning models. However, various approaches are used and it is unclear which provides the best performance.

      Objectives

      To compare the most commonly used feature selection and classification methods and determine the optimal models for classifying COPD status in a mild, population-based COPD cohort.

      Materials and Methods

      CT images from the multi-center Canadian Cohort Obstructive Lung Disease (CanCOLD) study were pre-processed by resampling the image to a 1mm isotropic voxel volume, segmenting the lung and removing the airways (VIDA Diagnostics Inc.), and applying a threshold of -1000HU-to-0HU. A total of 95 texture features were then extracted from each CT image. Combinations of 17 feature selection methods and 9 classifiers were tested and evaluated. In addition, the role of data cleaning (outlier removal and highly correlated feature removal) was evaluated. The area under the curve (AUC) from the receiver operating characteristic curve was used to evaluate model performance.

      Results

      A total of 1204 participants were evaluated (n = 602 no COPD, n = 602 COPD). There were no significant differences between the groups for female sex (no COPD = 46.3%; COPD = 38.5%; p = 0.77), or body mass index (no COPD = 27.7 kg/m2; COPD = 27.4 kg/m2; p = 0.21). The highest AUC value for predicting COPD status (AUC = 0.78 [0.73, 0.84]) was obtained following data cleaning and feature selection using Elastic Net with the Linear-SVM classifier.

      Conclusion

      In a population-based cohort, the optimal combination for radiomics-based prediction of COPD status was Elastic Net as the feature selection method and Linear-SVM as the classifier.

      Key Words

      Abbreviation:

      COPD (chronic obstructive pulmonary disease), CT (computed tomography), QCT (quantitative CT), HU (Hounsfield units), LAA950 (low attenuation areas below -950HU), CanCOLD (Canadian Cohort of Obstructive Lung Disease), GOLD (Global Initiative for Chronic Obstructive Lung Disease), ATS (American Thoracic Society), FEV1 (forced expiratory volume in one second), FVC (forced vital capacity), HU15 (HU value corresponding to the 15th percentile on the frequency distribution curve), LAC (low attenuation cluster), TAC (total airway count), Pi-10 (estimated airway wall thickness for an idealized airway with an Internal Perimeter of 10 mm), NJC (normalized join count), SERA (Standardized Environment for Radiomics Analysis), IBSI (Image Biomarker Standardization Imitative), GLCM (gray level co-occurrence matrix), GLRLM (gray level run length matrix), GLSZM (gray level size zone matrix), GLDZM (gray level distance zone matrix), NGTDM (neighborhood gray tone difference matrix), NGLDM (neighboring gray level dependence matrix), ROC (receiver operating characteristic), AUC (area under the curve), CI (confidence interval), BMI (body mass index), PFT (pulmonary function test), SVM (support vector machine)
      To read this article in full you will need to make a payment

      Purchase one-time access:

      Academic & Personal: 24 hour online accessCorporate R&D Professionals: 24 hour online access
      One-time access price info
      • For academic or personal research use, select 'Academic and Personal'
      • For corporate R&D use, select 'Corporate R&D Professionals'

      Subscribe:

      Subscribe to Academic Radiology
      Already a print subscriber? Claim online access
      Already an online subscriber? Sign in
      Institutional Access: Sign in to ScienceDirect

      References

        • Barnes PJ
        • Celli BR.
        Systemic manifestations and comorbidities of COPD.
        Eur Respir J. 2009; 33: 1165-1185https://doi.org/10.1183/09031936.00128008
        • Shaker SB
        • Stavngaard T
        • Laursen LC
        • et al.
        Rapid fall in lung density following smoking cessation in COPD.
        J Chronic Obstr Pulm Dis. 2011; 8: 2-7https://doi.org/10.3109/15412555.2010.541306
        • Gietema HA
        • Müller NL
        • Nasute Fauerbach P v.
        • et al.
        Quantifying the extent of emphysema: factors associated with radiologists’ estimations and quantitative indices of emphysema severity using the ECLIPSE cohort.
        Acad Radiol. 2011; 18: 661-671https://doi.org/10.1016/J.ACRA.2011.01.011
        • Virdee S
        • Tan WC
        • Hogg JC
        • et al.
        Spatial dependence of ct emphysema in chronic obstructive pulmonary disease quantified by using join-count statistics.
        Radiology. 2021; 301: 702-709https://doi.org/10.1148/RADIOL.2021210198
        • Kirby M
        • Smith BM
        • Tanabe N
        • et al.
        Computed tomography total airway count predicts progression to COPD in at-risk smokers.
        ERJ Open Res. 2021; 7: 00307-02021https://doi.org/10.1183/23120541.00307-2021
        • Charbonnier JP
        • Pompe E
        • Moore C
        • et al.
        Airway wall thickening on CT: relation to smoking status and severity of COPD.
        Respir Med. 2019; 146: 36-41https://doi.org/10.1016/J.RMED.2018.11.014
        • Moslemi A
        • Makimoto K
        • Tan WC
        • et al.
        Quantitative CT lung imaging and machine learning improves prediction of emergency room visits and hospitalizations in COPD.
        Acad Radiol. 2022; https://doi.org/10.1016/J.ACRA.2022.05.009
        • Kirby M
        • Hatt C
        • Obuchowski N
        • et al.
        Inter- and intra-software reproducibility of computed tomography lung density measurements.
        Med Phys. 2020; 47: 2962-2969https://doi.org/10.1002/MP.14130
        • Muller NL
        • Staples CA
        • Miller RR
        • et al.
        “Density mask”: an objective method to quantitate emphysema using computed tomography.
        Chest. 1988; 94: 782-787https://doi.org/10.1378/CHEST.94.4.782
      1. Zwanenburg A, Vallières M, Abdalah M A, et al. The image biomarker standardization initiative: standardized quantitative radiomics for high-throughput image-based phenotyping, Radiology, 295, 2020, 328–38.

        • Sun P
        • Wang D
        • Mok VC
        • et al.
        Comparison of feature selection methods and machine learning classifiers for radiomics analysis in glioma grading.
        IEEE Access. 2019; 7: 102010-102020https://doi.org/10.1109/ACCESS.2019.2928975
        • Parmar C
        • Grossmann P
        • Bussink J
        • et al.
        Machine learning methods for quantitative radiomic biomarkers.
        Sci Rep. 2015; 5: 1-11https://doi.org/10.1038/srep13087
        • Krajnc D
        • Papp L
        • Nakuz TS
        • et al.
        Breast tumor characterization using [18F]FDG-PET/CT imaging combined with data preprocessing and radiomics.
        Cancers. 2021; 13: 1249https://doi.org/10.3390/CANCERS13061249
        • Li Z
        • Liu L
        • Zhang Z
        • et al.
        A novel CT-based radiomics features analysis for identification and severity staging of COPD.
        Acad Radiol. 2022; 29: 663-673https://doi.org/10.1016/j.acra.2022.01.004
        • Almuallim H
        • Dietterich TG
        • Hall D.
        Learning with many irrelevant features.
        AAAI. 1991; 91: 547-552
        • Bluma AL
        • Langley P.
        Artificial intelligence selection of relevant features and examples in machine.
        Artif Intell. 1997; 97: 245-271
        • John GH
        • Kohavi R
        • Pfleger K.
        Irrelevant features and the subset selection problem.
        in: Machine Learning Proceedings. Elsevier, 1994: 121-129https://doi.org/10.1016/B978-1-55860-335-6.50023-4
        • Efron B.
        The efficiency of logistic regression compared to normal discriminant analysis.
        J Am Stat Assoc. 1975; 70: 892-898https://doi.org/10.1080/01621459.1975.10480319
        • Ho T.
        Random decision forests.
        IEEE. 1995; 1: 278-282
        • Cortes C
        • Vapnik V.
        Support-vector networks.
        Mach Learn. 1995; 20: 273-297https://doi.org/10.1007/BF00994018
        • Altman NS.
        An introduction to kernel and nearest-neighbor nonparametric regression.
        Am Stat. 1992; 46: 175-185https://doi.org/10.1080/00031305.1992.10475879
        • Langley P.
        Induction of selective Bayesian classifiers.
        in: Uncertainty Proceedings. 1994: 399-406 (Accessed March 21, 2022)
        • Hopfield J.
        Neural networks and physical systems with emergent collective computational abilities.
        Natl Acad Sci. 1982; 79: 2554-2558
        • Bourbeau J
        • Tan WC
        • Benedetti A
        • et al.
        Canadian Cohort Obstructive Lung Disease (CanCOLD): fulfilling the need for longitudinal observational studies in COPD.
        J Chronic Obstr Pulm Dis. 2014; 11: 125-132https://doi.org/10.3109/15412555.2012.665520
        • Vestbo J.
        • Hurd SS
        • Agustí AG
        • et al.
        Global strategy for the diagnosis, management, and prevention of chronic obstructive pulmonary disease: GOLD executive summary.
        American journal of respiratory and critical care medicine. 2013; 187: 347-365
      2. Dis ATS-ARR, 1987 undefined. Standards for the diagnosis and care of patients with chronic obstructive pulmonary disease (COPD) and asthma. ci.nii.ac.jp. https://ci.nii.ac.jp/naid/10005111964/. Accessed February 23, 2022.

        • Au RC
        • Tan WC
        • Bourbeau J
        • et al.
        Impact of image pre-processing methods on computed tomography radiomics features in chronic obstructive pulmonary disease.
        Phys Med Biol. 2021; 66245015https://doi.org/10.1088/1361-6560/ac3eac
        • Thibault G
        • Angulo J
        • Meyer F.
        Advanced statistical matrices for texture characterization: application to cell classification.
        IEEE Trans Biomed Eng. 2014; 61: 630-637https://doi.org/10.1109/TBME.2013.2284600
        • Galloway MM.
        Texture analysis using gray level run lengths.
        Comput Graph Image Process. 1975; 4: 172-179https://doi.org/10.1016/S0146-664X(75)80008-6
        • Sun C
        • Wee WG.
        Neighboring gray level dependence matrix for texture classification.
        Comput Vis Graph Image Process. 1983; 23: 341-352https://doi.org/10.1016/0734-189X(83)90032-4
        • Amadasun M
        • King R.
        Texural features corresponding to texural properties.
        IEEE Trans Syst Man Cybern. 1989; 19: 1264-1274https://doi.org/10.1109/21.44046
        • Li W
        • Mo W
        • Zhang X
        • et al.
        Outlier detection and removal improves accuracy of machine learning approach to multispectral burn diagnostic imaging.
        J Biomed Opt. 2015; 20121305https://doi.org/10.1117/1.JBO.20.12.121305
        • Christodoulou E
        • Ma J
        • Collins GS
        • et al.
        A systematic review shows no performance benefit of machine learning over logistic regression for clinical prediction models.
        J Clin Epidemiol. 2019; 110: 12-22https://doi.org/10.1016/J.JCLINEPI.2019.02.004
        • Young RP
        • Hopkins RJ
        • Christmas T
        • et al.
        COPD prevalence is increased in lung cancer, independent of age, sex and smoking history.
        Eur Respir J. 2009; 34: 380-386https://doi.org/10.1183/09031936.00144208
        • Yasaka K
        • Akai H
        • Mackin D
        • et al.
        Precision of quantitative computed tomography texture analysis using image filtering: a phantom study for scanner variability.
        Medicine (Baltimore). 2017; 96: e6993https://doi.org/10.1097/MD.0000000000006993
        • Khalid S
        • Nasreen S
        • Khalil T.
        A survey of feature selection and feature extraction techniques in machine learning.
        in: 2014 Science and Information Conference. 2014: 372-378https://doi.org/10.1109/SAI.2014.6918213
        • Wu CT
        • Li GH
        • Huang CT
        • et al.
        Acute exacerbation of a chronic obstructive pulmonary disease prediction system using wearable device data, machine learning, and deep learning: development and cohort study.
        JMIR Mhealth Uhealth. 2021; 9: e22591https://doi.org/10.2196/22591