If you don't remember your password, you can reset it by entering your email address and clicking the Reset Password button. You will then receive an email that contains a secure link for resetting your password
If the address matches a valid account an email will be sent to __email__ with instructions for resetting your password
Division of Cardiovascular Imaging, Department of Radiology and Radiological Science, Medical University of South Carolina, 25 Courtenay Drive Room 2221 ART, Charleston, SC 29425
Division of Cardiovascular Imaging, Department of Radiology and Radiological Science, Medical University of South Carolina, 25 Courtenay Drive Room 2221 ART, Charleston, SC 29425
Division of Cardiovascular Imaging, Department of Radiology and Radiological Science, Medical University of South Carolina, 25 Courtenay Drive Room 2221 ART, Charleston, SC 29425
Division of Cardiovascular Imaging, Department of Radiology and Radiological Science, Medical University of South Carolina, 25 Courtenay Drive Room 2221 ART, Charleston, SC 29425
Division of Cardiovascular Imaging, Department of Radiology and Radiological Science, Medical University of South Carolina, 25 Courtenay Drive Room 2221 ART, Charleston, SC 29425
Division of Cardiovascular Imaging, Department of Radiology and Radiological Science, Medical University of South Carolina, 25 Courtenay Drive Room 2221 ART, Charleston, SC 29425
Division of Cardiovascular Imaging, Department of Radiology and Radiological Science, Medical University of South Carolina, 25 Courtenay Drive Room 2221 ART, Charleston, SC 29425
Division of Cardiovascular Imaging, Department of Radiology and Radiological Science, Medical University of South Carolina, 25 Courtenay Drive Room 2221 ART, Charleston, SC 29425
Division of Cardiovascular Imaging, Department of Radiology and Radiological Science, Medical University of South Carolina, 25 Courtenay Drive Room 2221 ART, Charleston, SC 29425
Division of Cardiovascular Imaging, Department of Radiology and Radiological Science, Medical University of South Carolina, 25 Courtenay Drive Room 2221 ART, Charleston, SC 29425
Division of Cardiovascular Imaging, Department of Radiology and Radiological Science, Medical University of South Carolina, 25 Courtenay Drive Room 2221 ART, Charleston, SC 29425
Division of Cardiovascular Imaging, Department of Radiology and Radiological Science, Medical University of South Carolina, 25 Courtenay Drive Room 2221 ART, Charleston, SC 29425
Division of Cardiovascular Imaging, Department of Radiology and Radiological Science, Medical University of South Carolina, 25 Courtenay Drive Room 2221 ART, Charleston, SC 29425
Division of Cardiovascular Imaging, Department of Radiology and Radiological Science, Medical University of South Carolina, 25 Courtenay Drive Room 2221 ART, Charleston, SC 29425
The burden of coronavirus disease 2019 (COVID-19) airspace opacities is time consuming and challenging to quantify on computed tomography. The purpose of this study was to evaluate the ability of a deep convolutional neural network (dCNN) to predict inpatient outcomes associated with COVID-19 pneumonia.
Materials and Methods
A previously trained dCNN was tested on an external validation cohort of 241 patients who presented to the emergency department and received a chest computed tomography scan, 93 with COVID-19 and 168 without. Airspace opacity scoring systems were defined by the extent of airspace opacity in each lobe, totaled across the entire lungs. Expert and dCNN scores were concurrently evaluated for interobserver agreement, while both dCNN identified airspace opacity scoring and raw opacity values were used in the prediction of COVID-19 diagnosis and inpatient outcomes.
Results
Interobserver agreement for airspace opacity scoring was 0.892 (95% CI 0.834-0.930). Probability of each outcome behaved as a logistic function of the opacity scoring (25% intensive care unit admission at score of 13/25, 25% intubation at 17/25, and 25% mortality at 20/25). Length of hospitalization, intensive care unit stay, and intubation were associated with larger airspace opacity score (p = 0.032, 0.039, 0.036, respectively).
Conclusion
The tested dCNN was highly predictive of inpatient outcomes, performs at a near expert level, and provides added value for clinicians in terms of prognostication and disease severity.
The coronavirus disease 2019 (COVID-19) pandemic has created a unique challenge for medical personnel worldwide by becoming quickly pervasive. Many studies have identified the signs found and usefulness of chest computed tomography (CT) imaging (or even abdominopelvic lung base analysis) for triage of these patients with potential COVID-19 pneumonia, particularly to identify diagnostic and prognostic factors (
Radiological Society of North America Expert Consensus Statement on Reporting Chest CT Findings Related to COVID-19. Endorsed by the Society of Thoracic Radiology, the American College of Radiology, and RSNA - Secondary Publication.
). Therefore, the use of artificial intelligence (AI) deep learning models to prognosticate from CT images has been identified from the beginning of the pandemic as a potential way to expedite the triage process, improve prognostication, and guideline utilization of resources (
). The use of AI to prognosticate clinical course of COVID-19 pneumonia patients from subjective imaging features is challenging. One solution is the use of scoring systems, such as severity scoring, as standardization and efficiency are increased by protocol, resulting in higher-quality, evidence-based decision making by clinicians. However, manual segment severity scoring is a time-consuming task which is not currently standard of care. Thus, utilizing AI severity scoring may be helpful in meeting the challenge of practical, reproducible triage of COVID-19 patients by identifying patients at high risk for morbidity and mortality (
Automated assessment of COVID-19 reporting and data system and Chest CT severity scores in patients suspected of having COVID-19 using artificial intelligence.
While there is a relative paucity of studies utilizing severity scoring during the task of COVID-19 CT image interpretation, several studies have demonstrated the efficacy of scoring images with severity scoring methods (
Automated assessment of COVID-19 reporting and data system and Chest CT severity scores in patients suspected of having COVID-19 using artificial intelligence.
). Lessman, et al. reported moderate agreement for score determination by AI methods when in comparison to expert radiologists’ interpretation; with high area under curve (AUC), sensitivity, and specificity (internal set: 0.95, 85.7%, and 89.8% and external set: 0.88, 82.0%, and 80.5%, respectively) (
Automated assessment of COVID-19 reporting and data system and Chest CT severity scores in patients suspected of having COVID-19 using artificial intelligence.
). Lassau, et al. calculated severity scores based on clinical factors and then recalculated the scored based on the combination of clinical factors and imaging interpretation by AI. The AI-assisted method of score calculation outperformed the previously determined score in terms of prognostic ability (
Quantification of COVID-19 opacities on Chest CT - evaluation of a fully automatic AI-approach to noninvasively differentiate critical versus noncritical patients.
However, methods and image sources vary between studies and may be prone to bias and overfitting from use of identical or poorly annotated images from publicly available datasets. Furthermore, few seldom test their methods against a real-world contiguous patient cohort with well-defined outcomes (
). Therefore, the purpose of this study is to analyze the efficacy of a novel deep learning model in determining prognostic value of an AI severity scoring algorithm.
METHODS
Study Population, Clinical Information, and Imaging Data Acquisition
The protocol of this retrospective study was approved by the local Institutional Review Board and the need for informed consent was waived. Patient data was collected and anonymized in compliance with HIPAA and institutional protocols to protect patient privacy. A total of 241 patients were enrolled in this study, 93 severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) polymerase chain reaction (PCR) positive and 158 SARS-CoV-2 PCR negative, who underwent a chest CT with or without contrast from March 2020 to February 2021. Data collected included demographics, clinical comorbidities, and outcome variables which included hospitalization, intensive care unit (ICU) admission, intubation, and mortality. A preliminary patient list was collected through billing code search using COVID-19 testing and chest CT identifiers. Data collection was performed by chart review and compiled in a de-identified encrypted document. Imaging data from chest CT scans with 1 mm slice thickness including non-contrast and iodinated contrast enhanced studies (mAs and kVp selected according to patients’ body mass index) were acquired from Somatom Force and Naeotom Alpha CT scanners (Siemens Healthineers, Forcheim, Germany). Archived data was then exported from the picture archiving and communication system and uploaded to the AI interface (AI-RAD companion, Siemens Healthineeers) where the algorithm was executed and the results recorded.
Study Design
A single-institution retrospective case-control study was performed. Inclusion criteria included patients >18 years old who presented to the emergency department, received both a COVID-19 test and a chest CT within 14 days, and had sufficient same-institution follow-up for outcomes analysis (1 month post discharge from emergency department (ED) or inpatient hospitalization). Controls were selected based on an eligible CT scan with a negative SARS-CoV-2 PCR in the stated timeframe. These controls were neither age nor sex matched. Exclusion criteria included prior pulmonary surgical history, viral pneumonia other than COVID-19, and excessive artifact on chest CT.
The gold standard used was an expert-derived airspace opacity score. Three cardiothoracic trained radiologists comprised the expert determination of airspace opacities as given by Bernheim et al. (
) For each lobe, the disease extent was judged to be one of the following categories: (0) the lobe is not affected; (1) 1%-25%; (2) 25%-50%; (3) 50%-75%; and (4) 75%-100%. The scores for each of the five lobes were summed to calculate the total severity score, resulting in a total score range from 0 to 20. A 0 indicates that none of the lobes are involved and 20 indicates that all five lobes are severely affected.
The primary endpoints were interobserver agreement between AI and the radiologists for the determination of COVID-19 extent as well as the predictive capability of airspace opacity scoring and other AI measurements for the diagnosis of COVID-19 pneumonia.
Convolutional Neural Network Architecture and Outputs
The deep convolutional neural network (dCNN) algorithm has been previously described in Chaganti et al. (
) Briefly, the original dCNN was trained on 901 chest CT scans (431 COVID-19, 174 viral pneumonia, and 296 with interstitial lung disease) with a validation cohort of 200 patients (100 COVID-19 and 100 control). The general architecture utilized a preprocessing step with deep-image-to-image lung segmentation using the carina as a landmark with alignment, then a DenseUNet architecture for feature (ground glass opacity, etc.) extraction, subsequently followed by segmentation and global classification. Please see appendix E1 in Chaganti et al. for a detailed description of the neural network architecture, training, and measures such as loss function.
Statistical Analysis
A power calculation optimized for outcomes assuming at least a 10% prevalence of each event required >150 patients for a standardized power of 0.8. Post-hoc, 241 patients conferred a power of 0.965 for simple logistic regression analysis (Fig S1). Aggregate demographics and clinical risk factors analysis was performed using SARS-CoV-2 PCR positivity as the stratifying variable. Continuous variables were assessed for normality and reported as medians plus interquartile ranges. Categorical variables were reported with count and frequency as percent.
Primarily, interobserver agreement for quantitative scoring was assessed using intraclass correlation coefficients (ICC) with 2-way mixed effects, single rater (k), and absolute agreement. Adjusted linear model R2 and p-values were also reported for assessment of linearity of results. Cohen's kappa was reported with confidence interval as a secondary measure of categorical agreement. For categorical agreement, any airspace opacity was counted as a positive result, and no airspace opacities were defined as the only negative result and only used in the context of COVID-19 positive patients to focus on specific performance on COVID-19 patients. Diagnostic parameters were reported using confidence intervals constructed using the Clopper-Pearson method.
Multivariate modelling for COVID-19 diagnosis was performed using multiple logistic regression. Briefly, backwards stepwise logistic regression was performed on all AI generated measurements until all retained model elements were significant (p < 0.05) in the model. The model with the lowest Akaike information criterion was selected among the models with significant elements. Multivariate modelling of outcomes was performed using the variables deemed diagnostic for COVID-19 in the previous analysis. Optimal airspace opacity score cutoffs were empirically selected using a bootstrapping approach with 200 repetitions of 1:1 COVID-19 positive/negative stratification sampling were used by maximization of the bootstrapped accuracy metric. Figure S2 demonstrates the empiric selection process. Time-derived outcome variables were analyzed by binning into quintiles to improve reader interpretability. Differences between each quintile were assessed using one-way ANOVA. Means and standard errors were reported for continuous variables. All statistical analysis was performed in R v 3.6.3.
RESULTS
In this study 93 patients (38.5%) were positive for SARS-CoV-2. The median age of those with and without COVID-19 was 59 (IQR 45-71) and 62 (IQR 47-69), respectively. A greater proportion of those with COVID-19 were male in comparison to controls (61.5% vs 51.9). The median time between nasopharyngeal swab and imaging was 3 days for SARS-CoV-2 positive patients and 0 days for SARS-CoV-2 negative patients. Patients positive for SARS-CoV-2 were more likely to be Black or Hispanic (57.0%, 2.3%) than SARS-CoV-2 negative patients (37.7%, 0%). In comparison to control patients, SARS-CoV-2 positive patients were more frequently smokers (93.4% vs 48.8%), more likely to have hypertension (65.9% vs 49.2%), and more likely to be diabetic (37.6% vs 23.8) (Table 1).
Table 1Demographics and Clinical Comorbidities of Patients Enrolled in this Study Stratified by SARS-CoV-2 Nasopharyngeal Swab PCR Results
N = 241
SARS-CoV-2 Positive (N = 93)
SARS-CoV-2 Negative (N = 148)
Median
IQR
Median
IQR
Age (years)
59
45-71
62
47-69
BMI (kg/m2)
29.3
25.8-36.1
26.5
21.9-33.0
Symptom days
6
2-9
4
1-9
PCR-Imaging Δ
3
0-8
0
1-9
Count
Frequency (%)
Count
Frequency (%)
Sex
Female
35
38.5
62
48.1
Male
56
61.5
67
51.9
Ethnicity
Black
49
57.0
40
37.7
Hispanic
2
2.3
0
0
Other
5
5.8
2
1.9
White
30
34.9
64
60.4
Prior Structural Lung disease
32
34.8
37
28.5
History of Cancer
9
11.9
46
33.6
Smoking History
85
93.4
44
48.4
Hypertension
60
65.9
64
49.2
Diabetes
34
37.6
31
23.8
CHF
16
17.8
27
20.8
CKD
14
15.4
19
14.6
Autoimmune disease
14
15.4
17
13.1
HIV
0
0
6
4.6
BMI, body mass index; CHF, congestive heart failure; CKD, chronic kidney disease; HIV, human immunodeficiency virus; IQR, interquartile range; PCR, polymerase chain reaction; SARS-CoV-2, severe acute respiratory syndrome coronavirus 2.
The AI dashboard, the provided summary of the algorithm output, demonstrates highlighted airspace opacities in the axial view with the possibility to reconstruct the affected tissue in three dimensions. The results dashboard provides readers with information regarding the extent of the airspace opacities as broken down by lobe. Results include opacity scores, lung volumes, mean and standard deviation of the Hounsfield units for affected lungs, volumes of affected lung tissue and high opacity measurements (Fig 1).
Figure 1Artificial Intelligence dashboard for automated evaluation of chest computed tomography for COVID-19. (a) Axial view of lung fields with highlighted opacities segmented by neural network algorithm. (b) Three-dimensional Image reconstruction of lungs with rending of involved airspace opacities. (C) Parameters involved with diagnosis of COVID-19 by AI. AI, Artificial intelligence; COVID-19, Coronavirus disease 2019. (Color version of figure is available online.)
The overall correlation between observer estimates of severity score was 0.827 (95% CI 0.751 - 0.891). The expert and AI had a high rate of agreement with ICC of 0.892 (95% CI 0.834-0.930), p < 0.001. The Adjusted R2 for explanation of model variance was 0.69, p < 0.001 (Fig 2a). Overall, The accuracy of the dCNN was 0.828 (95% CI 0.751-0.905) and sensitivity was 0.914 (95% CI 0.830-0.965) (Fig 2b).
Figure 2Interobserver agreement between expert and AI opacity scores in patients who were positive for SARS-CoV-2 by PCR. (a) Quantitative comparison of opacity score. (b) Qualitative assessment for detection of any airspace opacities. AI, Artificial intelligence; NPV, negative predictive value; PCR, polymerase chain reaction; PPV, positive predictive value; SARS-CoV-2, Severe acute respiratory syndrome coronavirus 2. (Color version of figure is available online.)
Using the measurements given in the AI dashboard, a best fit multivariate model consisting of total opacity volume (cm3), high opacity volume (cm3), standard deviation of opacity Hounsfield units, and total standard deviation of all Hounsfield units gives an AUC of 0.805 (95% CI 0.745-0.862) for the diagnosis of COVID-19. The AUC for individual predictors of the model range from 0.728 to 0.561. All variable coefficients were significant in the multiple logistic regression model (p < 0.05) (Fig 3). The same combination of variables predicts need for inpatient hospitalization (AUC = 0.810) and ICU admission, intubation, and mortality at AUCs ranging from 0.666 to 0.683. The trend of accuracy was highest for events earliest in each patient's time course (Fig 4).
Figure 3AI-segmented imaging features for use in prediction of COVID-19 status. (a) Multivariate imaging model consisting of total opacity volume (cm3), high opacity volume (cm3), standard deviation of opacity Hounsfield units, and total standard deviation of all Hounsfield units. All variables were significant in multiple logistic regression model (p < 0.05). (b) Individual features used in the multivariate model and their individual diagnostic performance for COVID-19 diagnosis. Opacity volume, followed by high opacity volume, had the largest predictive power for COVID-19 diagnosis. AI, Artificial intelligence; AUC, area under curve; COVID-19, Coronavirus disease 2019; HU, Hounsfield units; SD, standard deviation. (Color version of figure is available online.)
Figure 4Multivariate logistic regression modelling of outcomes used in this study. A combination of variables (Opacity Volume, High Opacity Volume, SD Opacity HU, SD Total HU) derived from significant predictors of COVID-19 status predict hospitalization, ICU admission, intubation and death.
AUC, area under curve; COVID-19, Coronavirus disease 2019; HU, Hounsfield units; ICU, intensive care unit. (Color version of figure is available online.)
In regards to threshold determination, AI Airspace opacity ≥13 was accurate (0.777 95%; CI 0.724-0.829) and specific (0.873; 95% CI 0.822-0.913) for mortality. AI airspace opacity ≤13 had a high NPV for death (0.946; 95% CI 0.915-0.977). Accuracy of AI airspace opacity ≥8 for hospitalization was 0.777 (95% CI 0.724-0.829). Accuracy of AI airspace opacity ≥ 9 for ICU admission was 0.744 (95% CI 0.680-0.799). Accuracy of AI airspace opacity ≥12 for intubation was 0.839 (95% CI 0.793-0.885) (Table 2). Using the threshold values within the logistic model probabilities, there is a 25% risk of respective outcomes at an airspace score of 6 (Hospitalization), 13 (ICU Admission), 16 (Intubation) and ≥20 (Mortality). Significant increases in probability of mortality does not occur until AI Airspace opacity >10. A maximum score of 20 conferred an 87.5% probability of hospitalization, 50% probability of ICU admission, 37.5% probability of intubation, and about a 25% probability of mortality. The points of most uncertainty came at scores > 15, suggesting other risk factors are increasingly important at these upper ranges (Fig 5).
Table 2Diagnostic Parameters of the Most Accurate Thresholds for Inpatient Outcomes. AI Airspace Score Thresholds Have a High Specificity and NPV for Identifying Patients at Risk of Morbidity and Mortality
AI Airspace Opacity Score ≥ 8 and Hospitalization (N = 241)
Hospitalization
No Hospitalization
% Hospitalized
Opacity Score ≥8
45
28
61.6%
Opacity Score <8
26
143
15.4%
Accuracy
0.777 (0.724-0.829)
Odds Ratio
8.8 (4.7-16.5)
Sensitivity
0.634 (0.511-0.745)
PPV
0.616 (0.505-0.728)
Specificity
0.836 (0.772-0.888)
NPV
0.846 (0.792-0.901)
AI Airspace Opacity Score ≥ 9 and ICU Admission (N = 241)
ICU Admission
No ICU Admission
% ICU
Opacity Score ≥9
18
49
26.9%
Opacity Score <9
13
162
7.4%
Accuracy
0.744 (0.689-0.799)
Odds Ratio
4.58 (2.1-9.8)
Sensitivity
0.581 (0.391-0.755)
PPV
0.269 (0.163-0.375)
Specificity
0.768 (0.705-0.823)
NPV
0.926 (0.887-0.965)
AI Airspace Opacity Score ≥ 12 and Intubation (N = 241)
Intubation
No Intubation
% Intubation
Opacity Score ≥12
9
28
24.3%
Opacity Score <12
11
194
5.4%
Accuracy
0.839 (0.793-0.885)
Odds Ratio
5.67 (2.07-9.95)
Sensitivity
0.450 (0.231-0.685)
PPV
0.243 (0.105-0.381)
Specificity
0.874 (0.823-0.915)
NPV
0.946 (0.915-0.977)
AI Airspace Opacity Score ≥ 13 and Mortality (N = 241)
Figure 5Probability of inpatient outcomes among as a logistic function of AI opacity score. The probability of an inpatient event follows an exponential function. AI, Artificial intelligence. (Color version of figure is available online.)
The AI airspace opacity scores predict time-to-event and inpatient durations, with the mean hospitalization duration, ICU duration, and intubation duration being associated with increased AI airspace opacity scores (p = 0.032, 0.039, 0.036, respectively). The time from hospital admission to ICU admission was not significantly associated with AI airspace opacity scores (p = 0.159) (Fig 6).
Figure 6Time to event and inpatient duration analysis among hospitalized COVID-19 patients using airspace opacity score quintiles with reported means and standard errors. Mean hospitalization duration (a), ICU duration (b), and intubation (d) duration were associated with increased AI airspace opacity scores. Time from hospital admission to ICU admission (c) was not significantly associated with AI airspace opacity scores. AI, Artificial intelligence; ANOVA, analysis of variance; COVID-19, Coronavirus disease 2019; Hosp, hospitalization; ICU, intensive care unit. (Color version of figure is available online.)
The purpose of this study was to test a previously trained deep convolutional neural network for diagnostic and prognostic purposes in patients with COVID-19 pneumonia as seen on chest CT. A total of 241 patients (93 COVID-19 positive) were evaluated by the dCNN in this external testing cohort design. The AI algorithm was highly accurate compared to attending radiologists with ICCs approaching human-level agreement. Several key interpretable outputs were derived including opacity volumes, parenchymal-opacity ratios, and other 2nd order statistics. When put together into a standardized scoring system, several cutoffs were identified that process in a stepwise fashion in terms of severity. Lastly, both probabilities of inpatient outcomes and time-to-events behaved as a function of the airspace opacity scoring system, establishing expected prognostic gradients that may influence patient care.
It is critical to understand the accuracy of expert observers in the diagnosis of COVID-19 pneumonia from chest CT, as the gold standard used in this study was the expert quantification of airspace severity. Baseline expert accuracy in comparison to PCR surpasses 90% for the diagnosis of COVID-19 pneumonia. The ICC for expert-AI quantitative severity scoring represented “excellent” agreement. Overall, AI accuracy for patients with COVID-19 by positive PCR was high for identifying airspace opacities related to COVID-19 lesions. However, the correlation coefficient in this external validation cohort was mildly less than the previously published training data for the neural network (
While the focus of this study is on AI severity scoring, multivariate modelling of AI segmented measurements has an advantage over a scoring heuristic for the diagnosis of COVID-19 pneumonia (
). A multivariate model consisting of opacity volume, “high opacity” volume, and the standard deviation of both opacity Hounsfield units and total Hounsfield units provides an AUC of 0.805, greater than the sum of its parts or the opacity scoring system. Expert measurement of opacity volumes and standard deviations are not feasible, reflecting a possible advantage of using AI systems in the prediction of COVID-19 pneumonia. Indeed, some radiomic studies suggest the quantitative parenchymal involvement to be important indicators of severe outcomes (
A novel machine learning-derived radiomic signature of the whole lung differentiates stable from progressive COVID-19 infection: a retrospective cohort study.
Further clinical utility can be derived from the prediction of outcomes from airspace severity scoring. Quantitative AI airspace values readily predict inpatient hospitalization with reasonable accuracy, providing immediate clinical utility from the emergency department. More advanced outcomes (ICU admission, Intubation, and mortality) had predictions which were less strong, likely related to the multifactorial risk factors for each outcome. Certainly, already verified risk factors such as age, immunosuppression, BMI, and sex contribute to the overall predictive value of the imaging factors to a large degree in late inpatient clinical outcomes.
The presence of “large” or “extensive” airspace opacities on chest imaging often evokes a negative reaction for poor prognosis among physicians caring for COVID-19 patients. However, the actual relationship of the quantitative extent of the airspace opacities and inpatient outcomes is poorly understood (
). Certainly, radiologists may be able to segment airspace opacities by hand to provide extra clinical value, but this is a time intensive and laborious process which presents difficulty in the setting of increased chest imaging volumes during the COVID-19 pandemic (
). Therefore, the introduction of an AI algorithm that would automatically segment the airspace opacities and provide a numeric, interpretable score could add value to the prognostication of COVID-19 pneumonia and change clinical management as patients progress down the COVID-19 treatment protocol (
). Still, a main challenge with expert-derived approaches include interobserver variation, which is partially rectified using a standardized AI approach (
Comparison of predictive ability with the literature at large is a challenging task due to the heterogeneity of methods, preponderance of public dataset usage and transfer learning, and a risk of bias (
). Fewer studies still have investigated an interpretable AI severity score from chest CT for both diagnosis and prognosis, but among those with similar aims the correlation coefficients are usually high between the experts and AI (0.87-0.97) (
Machine learning based on clinical characteristics and chest CT quantitative measurements for prediction of adverse clinical outcomes in hospitalized patients with COVID-19.
). Univariate severity score AUCs in this range should be expected as other clinical variables (age, immunosuppression, etc.) contribute to disease progression and mortality in patients with COVID-19. A recent study found AUCs of 0.70-0.77 for inpatient outcomes by use of deep learning, which corroborates with our results (
). It is likely that the univariate prediction strength of current AI methods lies within this range, but we suggest that our study stands out in this cohort due to the use of interpretable AI derived classification schemes.
Empirically derived opacity score thresholds improve on the accuracy and predictive ability of COVID-19-related inpatient outcomes (
). Many clinicians and patients are concerned about the next large decision points in COVID-19 clinical care, and airspace opacity scoring accurately prognosticates patient risk with negative predictive values > 90%; below 8 for hospitalization, below 9 for ICU admission, below 12 for intubation, and below 13 for death. For a patient with an airspace severity score of 2, 5, or 10, a physician could relate that the probability of death to be low at <10%. Conversely, for a hospitalized patient with an airspace opacity score of 17 and approaching escalation of care, a physician could quote upwards of 25% risk of intubation and 20% all-comers mortality when discussing goals of care. Furthermore, AI airspace opacity scoring can inform clinicians, patients, and hospital systems of length of stay and duration of high-level of care including invasive ventilation duration and ICU bed occupancy. Bracketing airspace opacity into quintiles demonstrates a clear upward trend in hospitalization duration, ICU duration, and intubation duration as a function of severity. Physicians could once again counsel a patient with a score of 15 and approaching intubation to expect an invasive ventilation duration of 10 days on average, albeit with a large degree of variation.
CT scans are obtained at other points in admission besides the initial encounter stages and with other possible viral pathologies. While the strategy employed in this study utilizes a cross-sectional time point (emergency department admission predicated around SARS-CoV-2 PCR testing), there is a lack of information on if follow-up scores would predict morbidity and mortality as anticipated. The authors find likely that a change in clinical situation should result in a differential rate of outcomes, but there is dearth of follow-up CT scans during hospital admission. At the present time we are unable to conclude if and how the severity score predictions would change over the course of the admission, and instead recommend interpretation of prognostics in the setting of early workup of disease. Regarding other causes of atypical pneumonia, CT has been well described in the evaluations of other viral pneumonias (
). Various deep learning algorithms have attempted to differentiate between COVID-19 and other viral pneumonias; however, the authors argue that with widespread SARS-CoV-2 testing availability this is less of a concern (
). Future study should investigate patients who had subsequent cross-sectional imaging during the hospital course and assess for changes in prognostic value.
LIMITATIONS
Limitations of this study include the single institution, retrospective nature of this study spanning multiple iterations of COVID-19 waves, vaccines, strains, and best practices. There is no current data to suggest how clinicians might approach this potential confounding aspect (i.e., radiologic findings of the Delta vs Omicron variant, vaccinated vs non-vaccinated vs booster received, etc). Additionally, this study was not powered to evaluate concurrent demographic and comorbidities as risk factors or effect modifiers as those were considered secondary endpoints. Further study is needed to develop more accurate risk modelling in the context of previously identified demographic and clinical variables. The patients enrolled in this study are also subject to selection bias by the criteria of having received a chest CT upon presentation. Patients who receive a CT scan in the ED more likely represent a population with more severe presenting illness, which may inflate the average airspace opacity score among COVID-19 positive patients. Severe outcomes in the COVID-19 group were sparse. A larger multi-institutional cohort is needed with more outcomes, for which this study will serve as the basis for a second power analysis.
Importantly, interpreting airspace opacities in the context of COVID-19 patients is murky, as the type of airspace opacity is not discriminated against by the AI program. For instance – ground-glass opacities, “tree-in-bud” pattern, and patchy consolidation is found in many patients without pulmonary disease or in non-COVID-19 viral pneumonia, but would count as a positive airspace opacity in patients with or without COVID-19 in this study (
Radiological Society of North America Expert Consensus Statement on Reporting Chest CT Findings Related to COVID-19. Endorsed by the Society of Thoracic Radiology, the American College of Radiology, and RSNA - Secondary Publication.
). Certain systems have been invented to classify COVID vs non-COVID pneumonia, but at the time of this article this is still an evolving science. A best practice would include performing a COVID-19 test before cross-sectional imaging to clarify pre-test probability (
The use of AI segmented quantitative airspace severity scoring is an accurate diagnostic and prognostic tool for COVID-19. The AI algorithm adequately quantifies burden of disease in COVID-19 patients and can provide a service which would otherwise be too time consuming for radiologists and clinicians. The AI scoring output is also easily interpretable, explaining the outputs of a convolutional neural network with relatively little previous knowledge required. Extra value is also provided to clinicians on the risk of progression of disease to their patients, which may change management and influence goals of care discussion. Further study will focus on multivariate predictive outcomes analysis with less emphasis on interobserver agreement.
Funding
Funding for this study was provided by Siemens Healthineers.
Radiological Society of North America Expert Consensus Statement on Reporting Chest CT Findings Related to COVID-19. Endorsed by the Society of Thoracic Radiology, the American College of Radiology, and RSNA - Secondary Publication.
Automated assessment of COVID-19 reporting and data system and Chest CT severity scores in patients suspected of having COVID-19 using artificial intelligence.
Quantification of COVID-19 opacities on Chest CT - evaluation of a fully automatic AI-approach to noninvasively differentiate critical versus noncritical patients.
A novel machine learning-derived radiomic signature of the whole lung differentiates stable from progressive COVID-19 infection: a retrospective cohort study.
Machine learning based on clinical characteristics and chest CT quantitative measurements for prediction of adverse clinical outcomes in hospitalized patients with COVID-19.