If you don't remember your password, you can reset it by entering your email address and clicking the Reset Password button. You will then receive an email that contains a secure link for resetting your password
If the address matches a valid account an email will be sent to __email__ with instructions for resetting your password
Department of Radiology, Shandong Provincial Hospital Affiliated to Shandong First Medical University, Jinan, Shandong, ChinaShandong Cancer Hospital and Institute, Shandong First Medical University and Shandong Academy of Medical Sciences
Department of Radiology, The First Affiliated Hospital of Shandong First Medical University & Shandong Provincial Qianfoshan Hospital, Jinan, Shandong, China
Address correspondence to: J.X. Department of Radiology, Shandong Provincial Hospital Affiliated to Shandong First Medical University, Jinan, Shandong, 250021, China.
To develop, validate, and test a comprehensive radiomics prediction model to distinguish parotid polymorphic adenomas (PAs) and warthin tumors (WTs) using clinical data and enhanced computed tomography (CT) from a multicenter cohort.
Materials and Methods
A total of 267 patients with PAs (n =172) or WTs (n = 95) from two hospitals were randomly divided into training (n =188) and validation (n =79) datasets. Radiomics features were extracted from the enhanced CT (arterial phase) followed by dimensionality reduction. Clinical and CT features were combined to establish a prediction model. A radiomics nomogram was constructed by combining RadScore and clinical factors. Moreover, an independent dataset of 31 patients from a third hospital was employed to test the model. Thus, the performance of the nomogram, radiomics signature, and clinical models was evaluated on the training, validation, and the independent testing datasets. Receiver operating characteristic (ROC) curves were used to compare the performance, and decision curve analysis (DCA) was used to evaluate the clinical effectiveness of the model.
Results
A total of 15 radiomics features were selected from CT data as the imaging markers to generate RadScores, and demographics or clinical data like age, sex, and smoking factors combined with RadScores were used to distinguish PAs and WTs based on multivariate logistic regression analyses. The results showed that radiomics nomograms combining clinical factors and RadScores provided satisfactory predictive values for distinguishing PAs from WTs, with areas under ROC curves (AUC) of 0.979, 0.922, and 0.903 for the training, validation, and the independent testing datasets, respectively. Decision curve analysis revealed that the radiomics nomogram outperformed the clinical factor models in terms of accuracy and effectiveness.
Conclusion
CT-based radiomics nomograms combining RadScores and clinical factors can be used to identify PAs and WTs, which may help tumor management by clinicians.
Salivary gland tumors account for 2%–6.5% of all head and neck cancers, of which parotid gland tumors are the major subtype, with a 75%−80% benign rate (
). Polymorphic adenomas (PAs) originating from epithelial tissue are the most common parotid gland tumors, and the second most common parotid tumors are warthin tumors (WTs), originating from both epithelial and lymphatic tissues. Other subtypes are relatively rare (
). Although PAs and WTs are mostly benign lesions, their biological behavior differs. PAs is relatively aggressive and has a malignant tendency and a chance of recurrence after resection (
). The prognoses and treatment methods are also different for PAs and WTs. Therefore, it is very important to accurately diagnose PA and WT prior to operation.
Ultrasound (US), computed tomography (CT), and magnetic resonance imaging (MRI) are commonly used for diagnosing parotid tumors. MRI is relatively slow, with a higher cost than other methods, and is also restricted due to magnetic compatibility, such as the presence of pacemakers (
Parotid gland tumors: can addition of diffusion-weighted MR imaging to dynamic contrast-enhanced MR imaging improve diagnostic accuracy in characterization.
); however, it might cause accidental damage such as facial nerve damage. Thus, it is desirable to distinguish PAs and WTs based on clinical symptoms/demographics and imaging information such as CT, to minimize unnecessary biopsies and optimize treatment plans. The challenge is that most parotid benign tumors do not have specific clinical manifestations (
). Intensity, shape, texture, and high-order features can be extracted from both images and tumor shapes to provide rich information that cannot be recognized by the naked eyes. Using these features, preoperative diagnoses of tumors can be obtained from imaging data for precise evaluations of tumor characteristics (
). At present, the radiomics analysis of parotid gland tumor has attracted the attention of many scholars. In the literature, studies of radiomics have been reported in analyzing head and neck tumors, distinguishing papillomavirus positive and negative cases in primary squamous cell carcinoma (
). Fruehwald-Pallamar and Zheng based on the texture feature analysis of MR T1, T2 weighted images, the identification of benign and malignant parotid gland tumors is realized (
This study therefore aimed to distinguish PA and WT using contrast-enhanced CT images (arterial phase) by combining clinical information with multicenter datasets. By splitting the data from two hospitals into training and validating sets and evaluating using the data from a third hospital, we showed that CT-based radiomics nomograms combining RadScores and clinical factors could be used to distinguish parotid PA and WT.
MATERIALS AND METHODS
Patients
The retrospective study was approved by the institutional review committees of all the participating institutions, and informed consent forms were waived for all the patients. All data were anonymized and detailed patient information could not be retrieved from the data used in the study. The output of the study did not alter any treatment plans for the patients enrolled.
We searched the pathology database from hospital #1 from December 2014 to April 2021 and hospitals #2 and #3 from December 2014 to January 2021 to retrieve data. The inclusion criteria were as follows: (1) patients with a clear biopsy or surgical output with confirmed pathological diagnosis of parotid tumor, and (2) patients who underwent enhanced CT two weeks prior to treatment or biopsy. The exclusion criteria were as follows: (1) patients with no complete clinical data or contrast enhancement CT; (2) images with obvious artifacts because of metal or beam sclerosis; and (3) patients with a maximum tumor diameter less than 1.0 cm. A total of 411 inpatient records were initially collected. Finally, 298 pathologically confirmed PAs (n = 191, with a mean age 42.19 ± 14.70 years, 76 males and 115 females) and WTs (n =107, with a mean age of 62.35 ± 10.15 years, 96 males and 11 females) were included, in which patients from hospitals #1 and #2 were randomly assigned to training and validation sets at a ratio of 7:3 and patients from hospital #3 were used as independent data to test the effectiveness and feasibility of the model to distinguish PA and WT. Figure 1 lists the details.
Figure 1Flowchart for patient selection. (Color version of figure is available online.)
The CT devices used are shown in the Supplementary Table. The parameters were as follows: tube voltage of 120 kVp with automatic tube current, and the slice thickness and distance were 5 mm. The scans ranged from the base of the skull to the aortic arch. Intravenous 60−120 mL (1.5 mL/kg) contrast agent (iopamidol injection, 37g/100ml; Shanghai Bracco Sine Pharmaceutical Industry, Shanghai, China) was managed with an injection rate of 3.0 mL/s. Arterial phase CT images were obtained 30 s after injection of the contrast agent.
Assessments of Clinical and Radiological Features
Clinical parameters such as age, sex, and smoking status were collected from the hospitals’ medical record system. All original CT images were annotated by two experienced head and neck radiologists who were blinded to clinical information and pathological outcomes. The annotations included tumor location (deep or superficial leaves), distribution (unilateral or bilateral), multiple (yes or no), maximum diameter (maximum lesion at multiple), cystic change (yes or no), edge (clear or unclear), and calcification (yes or no). Inconsistencies between observers were resolved by consensus.
Extractions of Image Segmentation and Radiomics Features
The flowchart of the radiomics analysis is shown in Figure 2. Radiomics analysis was performed using the uAI Research Portal version 1.0 (Shanghai United Imaging Intelligence, Shanghai, China), which is a clinical research platform and implemented in Python programming language (version 3.7.3, https://www.python.org) and the widely used packages, PyRadiomics (https://pyradiomics.readthedocs.io/en/latest/index.html) (
), were employed in this study. The tumor areas based on CT were set as the regions of interest (ROIs) for training and evaluating purposes, and each ROI was manually drawn on the axial images by two independent head & neck radiologists (Radiologist 1 with 8 years of experience and Radiologist 2 with 3 years of experience) who were blinded to the clinical information, avoiding normal tissues such as blood vessels and bones. An example of manual segmentation is shown in Figure 3.
Figure 2The flowchart of radiomics analysis in this study. (Color version of figure is available online.)
Figure 3Case 1 (a): polymorphic adenoma from a 35-year-old male. Case 2 (b): warthin tumor from a 75-year-old male. Manual segmentation of the tumor. (Color version of figure is available online.)
We also performed preprocessing before feature extraction, including image intensity normalization and image resampling. A total of 2264 radiomics features were extracted from the ROIs and normalized within the population which were divided into four groups, including 18 first-order features, 14 shape features, and 72 texture features extracted from the original image, and 2160 high level features (432 first-order features and 1728 texture features) extracted from images processed by 24 different filters. The detailed description and the texture features were subdivided into five groups, which were the gray-level co-occurrence matrix (GLCM, n = 21), gray-level run-length matrix (GLRLM, n = 14), gray-level size zone matrix (GLSZM, n = 16), gray-level dependence matrix (GLDM, n = 16), and neighboring Gray Tone Difference Matrix (NGTDM, n = 5).
The intra and interobserver concordances were assessed using an intra-class correlation coefficient (ICC) to indicate the reproducibility of radiomics features, which were calculated from a subset of the training data, i.e., 40 CT images (22 PAs and 18 WTs), from two blinded radiologists. The segmentation was performed independently by reader-1 and reader-2 during the same period to assess interobserver agreement of extracted radiomics features. Reader-1 then repeated the same case procedure 4 weeks later, allowing assessment of within-observer consistencies. An ICC greater than 0.75 indicated satisfactory agreement of ROI annotations.
Development of the Radiomics Signature
First, the variance threshold method was adopted to retain the features with variance greater than 0.80. Then, the univariate feature selection method, analysis of variance, was used to select features with significant difference levels of p < 0.05. Finally, remaining features were enrolled into the least absolute shrinkage and selection operator (LASSO) regression model to select the most valuable features in the training cohort. The objective function to minimize of LASSO was:
(1)
Based on the selected features by LASSO, a logistic regression (LR) model was constructed to classify PAs and WTs based on the training set. The predicted value obtained by the model was denoted as the radiomics signature or RadScore. The effectiveness of the model was then evaluated to determine its accuracy to distinguish PAs and WTs using the validation and the independent testing datasets.
Clinical Model and Radiomics Nomogram Construction
Single-factor LR was used to select the clinical factors, and then multiple LR analyses were used to develop the classification models. Each independent factor was calculated using odds ratios (OR) as the prediction of relative risk with 95% confidence intervals (CIs). The radiomics nomogram was combined with the significant clinical factors and RadScore. The calibration curve was generated to evaluate the performance of the radiomics nomogram and the decision curve analysis (DCA) based on clinical factors, radiomics signature, and radiomics nomograms were performed to calculate the net benefits of the entire cohort within a certain threshold probability, to assess the clinical effectiveness. The performance of the model was tested with the validation and independent testing datasets.
Statistical Analysis
Statistical analysis was performed using SPSS statistical software for Windows, version 22.0 (SPSS, Chicago, IL, USA) and Python (version 3.7.3, https://www.python.org). Univariate analysis was used to compare clinical variance differences between the three groups using independent samples t-tests and chi-square tests.
Final radiomics nomograms were also established based on logistic regression with the statistically significant clinical scores and RadScores. ROC analysis was used to evaluate the performance of each model, and AUC values of different models were compared using the Delong test. A decision curve was used to evaluate and validate the results from the radiomics nomograms. A two-tailed test with p < 0.05 denoted a statistically significant difference. The Hosmer-Lemeshow test to evaluate the model's goodness of fit. Integrated Discrimination Improvement (IDI) to test the improvement of the combined model.
RESULTS
Clinical Feature-Based Predictions
Table 1 summarizes the clinical factors and imaging features of the patients in the training, validation, and independent testing sets. There were significant differences in age, sex, and smoking between both PA and WT groups (p < 0.05) in the training set. Multivariate LR analysis revealed that age (OR: 19.186; 95% CI, 5.919 to 62.186), sex (OR, 15.077; 95% CI, 2.580 to 88.119) and smoking (OR, 4.664; 95% CI, 1.314 to 16.557) remained independent predictors in clinical factor models (Clinical-Model).
Table 1Clinical Factors of the Training, Validation, and Independent Testing Sets
Radiomics Feature Extraction, Selection, and RadScore Establishment
A total of 2264 features were extracted from each ROI from the CT images. Among these features, 1862 demonstrated good inter- and intraobserver agreements with ICCs ranging from 0.7506 to 0.9999. Then, 366 radiomics features exhibiting significant differences between PAs and WTs were used for LASSO feature selections, which provided 15 most valuable features as shown in Figure 4. Finally, the 15 radiomics features selected by LASSO were used to construct the radiomics signature (RadScore) based on logistic regression (Table 2).
Figure 4Radiomics signature selection using the least absolute shrinkage and selection operator (LASSO) regression model. (a) Mean square error path diagram. The abscissa is the log (alpha), and the dashed lines of 10 different colors indicate that a 10-fold cross validation; (b) LASSO path map, 15 color solid lines represent 15 texture parameters. (Color version of figure is available online.)
We then evaluated the performance of RadScore in distinguishing PAs and WTs. The AUCs for distinguishing PAs and WTs were 0.950, 0.834, and 0.861; the specificities were 0.851, 0.776, and 0.765; and the sensitivities were 0.925, 0.769, and 0.818 for the training, validation, and independent testing sets, respectively (see Table 3 for details).
Table 3Diagnostic Performance of the Clinical Factor Model, the Radiomics Signature, and the Radiomics Nomogram
The Radiomics Nomogram Development and Models Assessment
Age, sex, smoking, and RadScore were incorporated into a radiomics nomogram (Figure 5 A ). The nomogram illustrates that the RadScore and age accounted for a larger proportion compared to other clinical features, which were two important factors in distinguishing PAs and WTs.According to the calibration curve, the predicted values of the validation and testing sets were very close to the true values (Figure 5B, C). The AUCs for distinguishing PAs and WTs were 0.979, 0.922, and 0.903; the specificities were 0.95, 0.878, and 0.824; and the sensitivities were 0.955, 0.807, and 0.909 for the training, validation, and independent testing sets, respectively.The diagnostic performances of the clinical factor model, RadScore, and radiomics nomogram are summarized in Table 3.The ROC curves of the three models are shown in Figure 6 for the training, validation, and independent testing sets.
Figure 5Radiomics nomogram and calibration curves for the radiomics nomogram. (a) The radiomics nomogram, combining age, sex, smoking, and RadScore, developed in the training set. Calibration curves for the radiomics nomogram in the validation (b) and test (c) sets. Calibration curves indicate the goodness-of-fit of the nomogram. The 45° straight line represents the perfect match .The closer the distance between the two curves indicates the higher the accuracy. (Color version of figure is available online.)
Figure 6The receiver operating characteristic curves of the radiomics signature, the clinical factor model, and the radiomics nomogram in the training (a), validation (b), and independent-test (c) sets. (Color version of figure is available online.)
The AUCs of the radiomics nomogram were higher than that of the radiomics signature and clinical factor model in the training sets (both p < 0.05), in the validation sets, the AUCs of the radiomics nomogram were higher than the radiomics signature (p < 0.05). However, no significant differences in the AUC values were found in test set.
Hosmer-Lemeshow shows the model have goodness of fit (p> 0.05) . The IDI validated the combined model has positive improvement (Table 4) .The DCA showed that the radiomics nomogram had a higher overall net benefit in differentiating PAs from WTs than the clinical model across the majority of ranges of reasonable threshold probabilities (Figure 7).
Table 4Integrated Discrimination Improvement to Test the Improvement of the Combined Model
Figure 7Decision curve analysis for three models in the validation (a), and independent-test (b) sets. The y-axis expresses the net benefit; the x-axis expresses threshold probability. The red, blue, and green lines represent the net benefits of the radiomics nomogram, the clinical factor model, and the radiomics signature, respectively. The gray line indicates the hypothesis that all patients had polymorphic adenomas (PAs). The black line represents the hypothesis that all patients had warthin tumors (WTs).The graphs show that the combined model had the greatest net benefit for both datasets. (Color version of figure is available online.)
In this study, we developed and validated a noninvasive radiomics nomogram model based on clinical factors and CT radiomics features to distinguish PAs and WTs. Experimental results showed that the combined model outperformed those using only the clinical model and radiomics signatures in three separate clinical cohorts, and the AUCs were 0.979, 0.922, and 0.903 for the training, validation, and independent testing datasets, respectively. Therefore, the joint predictive model could be used for preoperative noninvasive evaluations of PAs in clinical practice.
Although both PA and WT are benign tumors, they differ in their biological behaviors. Compared to WTs, PAs had a higher malignant tendency and higher recurrence rate (
); therefore, it is essential to distinguish them during imaging diagnosis or preoperative planning. Currently, fine needle biopsy is used for pathological diagnosis (
), but to reduce unnecessary intervention and possible nerve damage, it is invaluable to study the differences between PA and WT based on clinical information and imaging features before the biopsy.
This study showed that clinical and imaging features could act as essential information to distinguish PAs and WTs. In this study, we found that WT patients were older than those with PA, sex showed clear differences between PAs and WTs, and smoking was also a significant difference between WTs than PAs. This explains why clinical factor models obtained better AUCs (training set:0.94, validation set:0.903, and independent testing set:0.877) in distinguishing PAs and WTs. The results are also consistent with previous findings. For example, it has been reported that PA occurs more likely at ages ranging from 30 to 50 years of age, are more common in women, and they often locate in the parotid superficial lobe (
The internal composition of PAs demonstrates heterogeneous patterns and often exhibits a muco-like or cartilage matrix, higher than those of WT signals, especially in T2-weighted images (
). WT has the highest microvascular distribution, and hence shows higher enhancement in CT. In CT enhancement scans, PAs show mild progressive delayed enhancement, while WT arterial phase is significantly strengthened, and enhancement in venous phase is decreased, which shows a quick “washout” pattern (
). Although it is difficult for radiologists to visually assess these features, radiomics analysis revealed clear evidence to distinguish PAs and WTs.
However, for MRI, such as diffusion weighted imaging (DWI) and dynamic contrast-enhanced (DCE)-MRI, the tumor appearances could also cause difficulties in diagnosing PAs and WTs. Due to the mixed internal composition of PAs containing different proportions of epithelial cells and mucus-like elements within the same tumor, its histological characteristics are highly heterogeneous (
). Therefore, similar techniques might be extendible to MRI. Texture feature analysis of T1 and T2-weighted images was used to distinguish parotid tumor PAs from WTs (
In previous studies, radiomics-based analyses of head and neck tumors have been reported to analyze CT and MR images, studying parotid morphology and secretion functions caused by radiation induction of head and neck cancer (
), but the study contained single-center data with a small sample size and lacked the applicability of external data testing models.
This study developed a radiomics-based analysis model, and selected 15 texture features from contrast enhanced CT (arterial phase). Among these features, GLSZM-Zone Entropy, GLRLM-Run Variance, and firstorder-90 Percentile were the most significant and robust features associated with PAs. Zone entropy measures the uncertainty/randomness in the distribution of zone sizes and gray levels. A higher value indicates more heterogeneity in the texture patterns. Run variance is a measure of the variance in runs during different run lengths. The difference in the n percentile reflects in part the different tissue composition (tumor heterogeneity) and the internal composition or distribution of the tumor (internal heterogeneity) (
). Most radiomics features are defined as a relationship between two adjacent pixels that may reflect the heterogeneity within the tumor. PAs contain different proportions of epithelial cells and mucus matrix that can be differentiated into sebum, mucus, chondraginous, squamous, acidophilic cells, etc. (
), thus there is high heterogeneity within PAs. Our results show that several GLCM features were selected and participated in building the prediction models.The GLCM features describe the relationship between two adjacent pixels, which may reflect local intra-tumor heterogeneity and can also reflect a higher-grade vascular distribution in WTs than PAs.This corresponds with the conclusion of song et al (
The present study had some limitations. First, the data were collected from three different centers, and each used different devices and contrast agents, which may have caused inconsistencies in the data analyzed. As such, the performance and robustness of the models need more verification. Second, the CT stage images acquired by the multicenter institutions were different. So, we constructed radiomics features using only arterial phase CT. Future works should be focused on multi-stage CT or MRI to improve the level of clinical diagnosis, and image normalization based on domain transformations.
CONCLUSION
We developed, validated, and tested a CT-based radiomics nomogram to distinguish PAs and WTs. As a noninvasive procedure, the proposed radiomics nomogram can serve as an effective tool to assist clinical decision-making processes.
Funding sources
This work was supported by the Academic Promotion Program of the Shandong First Medical University (grant no. 2019QL023).
Parotid gland tumors: can addition of diffusion-weighted MR imaging to dynamic contrast-enhanced MR imaging improve diagnostic accuracy in characterization.