3D DCE-MRI Radiomic Analysis for Malignant Lesion Prediction in Breast Cancer Patients

Rationale and Objectives: To develop and validate a radiomic model, with radiomic features extracted from breast Dynamic Contrast- Enhanced Magnetic Resonance Imaging (DCE-MRI) from a 1.5T scanner, for predicting the malignancy of masses with enhancement. Images were acquired using an 8-channel breast coil in the axial plane. The rationale behind this study is to show the feasibility of a radio- mics-powered model that could be integrated into the clinical practice by exploiting only standard-of-care DCE-MRI with the goal of reducing the required image pre-processing (ie, normalization and quantitative imaging map generation). Materials and Methods: 107 radiomic features were extracted from a manually annotated dataset of 111 patients, which was split into discovery and test sets. A feature calibration and pre-processing step was performed to ﬁ nd only robust non-redundant features. An in-depth discovery analysis was performed to de ﬁ ne a predictive model: for this purpose, a Support Vector Machine (SVM) was trained in a nested 5-fold cross-validation scheme, by exploiting several unsupervised feature selection methods. The predictive model performance was evaluated in terms of Area Under the Receiver Operating Characteristic (AUROC), speci ﬁ city, sensitivity, PPV and NPV. The test was performed on unseen held-out data. Results: The model combining Unsupervised Discriminative Feature Selection (UDFS) and SVMs on average achieved the best perfor- mance on the blinded test set: AUROC = 0.725 § 0.091, sensitivity = 0.709 § 0.176, speci ﬁ city = 0.741 § 0.114, PPV = 0.72 § 0.093, and NPV = 0.75 § 0.114. Conclusion: In this study, we built a radiomic predictive model based on breast DCE-MRI, using only the strongest enhancement phase, with promising results in terms of accuracy and speci ﬁ city in the differentiation of malignant from benign breast lesions. Abbreviations: 3DS 3D Shape, ADH Atypical Ductal Hyperplasia, AUROC Area Under the Receiver Operating Characteristic, CV Cross- Validation, DCE-MRI Dynamic Contrast-Enhanced Magnetic Resonance Imaging, DGUFS Dependence Guided Unsupervised Feature Selection, FS Feature Selection, GLCM Gray Level Co-occurrence Matrix features, GLDM Gray Level Dependence Matrix, GLRLM Gray Level Run Length Matrix, GLSZM Gray Level Size Zone Matrix, ICC Intraclass Correlation Coef ﬁ cient, LRLGLE Long Run Low Gray Level Emphasis, NGTDM Neighboring Gray Tone Difference Matrix, NIfTI Neuroimaging Informatics Technology Initiative, NPV Negative Predictive Value, PPV Positive Predictive Value, SOR Standard-of-Reference, SRE Short Run Emphasis, SVM Support Vector Machine, TE Echo Time, TR Repetition Time, UDFS Unsupervised Discriminative Feature Selection, UFSOL Unsupervised Feature Selection with Ordinal Locality


INTRODUCTION
B reast cancer currently represents the most common non-skin cancer in women and men, accounting for 11.7% of all new cancer diagnoses in 2020 (1). Because of its incidence and clinical impact, early and accurate cancer detection and characterization is of utmost importance. Diagnosis of early invasive breast cancer relies on clinical evaluation, radiological imaging and image-guided biopsy. Breast MRI is commonly used as a screening and problem-solving tool (2) and for the local staging of breast cancer (3). Its high sensitivity for the detection of breast lesions makes MRI a valuable screening tool, especially in patients at high risk for developing breast cancer. However, several studies have shown limited specificity when it comes to lesion characterization leading to high recall rates and the need for invasive and costly biopsies (4). Radiomics studies in breast imaging have shown promising results for lesion characterization, prediction of nodal metastases, tumor subtype, response predictions and prognostication (5,6).
Among imaging techniques, Dynamic Contrast-Enhanced Magnetic Resonance Imaging (DCE-MRI) provides both morphological (eg, size, margins, shape) (7) and hemodynamic information, assessing tumor vascularity and defining enhancement kinetic curves, with a reported sensitivity higher than 90% for breast cancer diagnosis (8). However, the specificity of DCE-MRI is still substantially low (about 72%) and it is often necessary to continue the diagnostic work-up with a biopsy (8).
In breast imaging, radiomics was applied to several imaging modalities, including MRI, mammography, ultrasound, and digital breast tomosynthesis (9). In the recent review by (10), 63% of the reviewed radiomics studies were based on MRI, thus demonstrating the high relevance for the scientific community. Particularly, the application of radiomics in breast DCE-MRI was assessed by addressing various issues in the evaluation of breast lesions from the diagnosis (eg, characterization of breast lesions, prediction of breast cancer histological types and correlation with receptor status) to the prognosis (eg, lymph node metastases, tumor response to neoadjuvant systemic therapy, recurrence risks) (11).
Radiomics extracts a large amount of quantitative imaging features from medical images, conveying more information than the visual and qualitative patterns observable by the radiologists' naked eye (12). In particular, radiomic biomarkers À which has attracted the attention of the scientific community, especially in the case of oncological imaging À can be associated with clinical outcomes (13). With this regard, quantitative imaging biomarkers have shown great potential in prediction, prognosis or treatment response assessment (14,15). To increase the robustness of these biomarkers, systematic studies on radiomic feature reliability were presented (16,17). As a matter of fact, initiatives and roadmaps were conducted in the last years to support the lack of reproducibility and validation of radiomics studies (18À20). Finally, to facilitate the clinical translation of the developed predictive and prognostic models, biological meaning and validation have been recently investigated (21,22).
The development of robust biomarkers could accelerate their incorporation into personalized medicine (23). Thus, standardization initiatives have been carried out by the scientific community to deal with the lack of reproducibility and validation of radiomics studies (20). An accurate and careful preliminary analysis of the robustness of the radiomic features is mandatory to define clinically relevant biomarkers able to provide generalization abilities on external datasets.
The primary objective of this study is to develop and validate a radiomic model capable of predicting malignant breast masses by using 3D radiomic features extracted from DCE-MRI, while the rationale is to show the feasibility of a radiomics-powered model that could be integrated into the clinical practice by exploiting only standard-of-care DCE-MRI with the goal of reducing the required image pre-processing (ie, normalization and quantitative imaging map generation).

Patients and Imaging Data Selection
Retrospective data collection was approved by the local Ethics Committee. The requirement for evidence of informed consent was waived because of the retrospective nature of our study.
A total of 194 Breast DCE-MRI exams, performed from November 2019 to October 2020 at the Breast Unit of the Fondazione Istituto "G. Giglio" in Cefal u (Palermo, Italy) were recruited for this study. The exclusion criteria applied during population enrolment are illustrated in Fig. 1.
The whole dataset resulting consisted of 111 lesions (size range: 5À60 mm; mean size §SD: 16.67 §11.18mm), depicted at DCE-MRI as masses with enhancement in 111 patients (110 women and 1 man; age range: 23À72 years; mean age §SD: 47.83 §9.16 years), and classified by a breast radiologist (with more than 30-year experience) according to BI-RADS criteria (24).
In patients with multi-focal lesions only the largest one was considered. As standard-of-reference (SOR), we included a definitive histological diagnosis for all breast lesions classified as BI-RADS 4-5 and a definitive histological diagnosis or a complete follow-up at 24 months for BI-RADS 3 lesions.
MR images were acquired using a 1.5T scanner (Signa HDxt; GE Healthcare, Barrington, IL, USA) using an 8channel breast coil and a standard protocol including T2w FSE with and without fat saturation, Diffusion-Weighted Imaging (DWI) and DCE-MRI (T1-weighted three-dimensional spoiled gradient echo) sequences in the axial plane. The second one was analyzed for radiomics purposes and imaging parameters are described in Table 1.
The whole dataset of 111 breast DCE-MRI sequences was divided into two groups, via an 80%À20% split hold-out approach, obtaining a: discovery set: used to define the best predictive model, in terms of feature selection (FS) method and classifier; test set: used in the test phase on unseen data.
The discovery set was used, by means of a nested k-fold cross-validation, to find the best predictive models, while the test set was adopted to validate the final model. This partitioning was performed 50 times by stratified sampling.

Image Processing
The overall workflow of this study is depicted in Fig. 2. Each step in the image processing and analysis pipeline is described in the following sections.

Lesion Segmentation
For each mass with enhancement included in our study, the tumor volume of interest was determined by manual sliceby-slice segmentation on DCE-MRI images and performed by a breast radiologist, with more than 5-year experience in breast MRI, in consensus with a consultant breast radiologist (with more than 30-year experience in breast imaging). All the breast masses were manually segmented using a MatLabcoded custom tool (25). The ROIs were delineated on the whole tumor on the DCE-MRI images with the strongest enhancement phase (26). In particular, among the phases provided by the VIBRANT sequences (6 or 7, across different  patients), the observer selected the one where the breast mass was more evident than in the background: on average, phase 3.66 §1.13 was chosen for segmenting the analyzed lesions. The acquisition of DCE-MRI involves the administration of a contrast medium, which better depicts the morphological/physiological characteristics of the tissues, where the examination includes various acquisitions in well-defined time intervals. Considering a specific position within the acquired volume, each voxel has a Time Intensity Curve, TIC(t), reflecting the signal intensity variations due to the absorption/release of the contrast medium. The time course of the TIC(t) curves can help clinicians to infer the type of lesion (eg, benign versus malignant).
In Zhang et al. (27), only 1 time-point of the DCE-MRI series was chosen. The strongest enhancement phase can better reflect the tumor heterogeneity and invasiveness by relying upon the subtracted DCE-MRI images according to the literature (27,28). Therefore, our choice to analyze a single phase is consistent with the literature. In particular, relying upon the enhancement curves (with the goal of increasing  the reproducibility), only the strongest enhancement phase (26,27), was selected and analyzed.
Lesion segmentation was performed without including peritumoral tissues. The corresponding data were stored in the Neuroimaging Informatics Technology Initiative (NIfTI) format (9). Fig. 3 shows two examples of breast lesion segmentation.

Radiomic Feature Extraction
The features were extracted (from the 3D ROIs delineated in the previous step) using PyRadiomics, an open-source Python package developed for the standardization of radiomic feature extraction (29). We used PyRadiomics version 2.0 and Python 3.7.5. Along with shape-based features, 6 feature classes were extracted: (1) first-order intensity histogram statistics, (2) Gray Level Co-occurrence Matrix features (GLCM) (30), (31) A well-established, practical rule of thumb states that at least 10 samples (ie, patients) are needed for each feature in a model based on binary classifiers (12). Indeed, due to the small sample size, we preferred to use only original features to avoid the processing of additional features extracted using convolutional image filters on the input medical images (eg, Laplacian of Gaussian, logarithmic, exponential, gradient, wavelets). Moreover, clear guidelines for filtered versions of the images are not yet available, with the release of the Image Biomarker Standardisation Initiative (IBSI) (36) Chapter 2 still being under preparation ([https://theibsi.github.io/ibsi2/).

Calibration and Pre-Processing
Calibration and pre-processing steps were performed to identify a subset of features that are independent from the MRI acquisition parameters, informative, and non-redundant, by following the guidelines outlined in (13).
Supplementary Section S1 'Calibration and Pre-processing Details' provides an in-depth description of the implemented pre-processing steps, while Section S2 'Radiomic Features' describes the final set of analized features.

Feature Selection and Predictive Modeling
The aim of FS is to reduce the data dimensionality by selecting only a subset of features to create a model (37 The predictive modeling was performed by a Support Vector Machine (SVM) (44,45) trained and tested using a nested 5-fold cross-validation (CV) procedure (Fig. 4).
This discovery phase was aimed to find the FS methods with the best performance in combination with the SVMbased predictive modeling. The discovery set consisting of 89 patients was used and 50 repetitions of the nested 5-fold CV training were performed to obtain average model results (see Supplementary Section S4 'Nested k-fold Cross-Validation'). The used evaluation metrics were the Area Under the Receiver Operating Characteristic (AUROC), sensitivity and specificity, along with Positive Predictive Value (PPV) and Negative Predictive Value (NPV) to better investigate true positive and true negative results, respectively (46).

Relevant Feature Analysis and Testing on the Held-out Set
After the discovery phase, the most relevant features were selected according to the best three FS methods. With the goal of assessing the radiomic signature performance in the clinical practice, the predictive model was evaluated on a held-out test set (composed of unseen data after the hold-out splitting). In particular, the best three models identified in the discovery phase were retrained on the entire discovery set and tested on the test set. Similar to the discovery phase, 50 repetitions of the holdout with stratified sampling were considered. See Supplementary Section S5 'Relevant Features and Radiomic Signatures'.

Statistical and Computational Analysis
All the statistical and computational analyses were performed using the MatLab R2019b (64-bit version) environment (The MathWorks, Natick, MA, USA).
For feature robustness analyses, the two-way random-effects model (or mixed-effects), consistency, single rater/measurement, ICC(3,1) was used (47) (for the definition, see Supplementary Section S1 'Calibration and Pre-processing Details'). The intrinsic dependency analysis made use of the Spearman correlation coefficient (p <0.0001 as a cutoff), while the redundant feature analysis used the Spearman correlation coefficient for pairwise feature comparison (r >0.90). No multiple-comparison correction was used to keep a reasonable number of features.
For distribution comparisons, the non-parametric Wilcoxon signed-rank test on paired samples was used, using a significance level of 0.05).
To realize the feature selection, the MatLab-coded 'Feature Selection Library v6' (https://arxiv.org/abs/ 1607.01327) was used (22À24). The SVM models were based on the Statistics and Machine Learning Toolbox provided by MatLab.

RESULTS
Experiments were aimed to quantify capabilities of the predictive models (FS method + classifier) in breast lesions characterization tasks. This section reports details about i) the models discovered and tested, ii) the radiomics signature obtained by each model, iii) the classification performance.

Study Population
Malignant lesions had a significantly (Mann-Whitney U test) higher volume at baseline compared to benign lesions According to the SOR, 103/111 masses with enhancement were histologically characterized resulting in 57/111 benign (51.35%), 2/111 (1.80%) high-risk (with uncertain malignant potential (ie, atypical ductal hyperplasia, ADH), and 52/111 malignant (46.85%) breast lesions. Predicting the upgrade from ADH to malignant lesions is still an open question in DCE-MRI radiomic studies (48). As a matter of fact, In literature studies with long-term follow-up, atypical hyperplasia has been shown to have a 4£ relative risk factor for future breast cancer (49). Therefore, with the goal of obtaining a highly sensitive predictive model able to discriminate the patients into two classes for recall visits and diagnosis, we made a conservative choice for clinical purposes: lesions of uncertain malignant  potential (ie, ADH) were included in the class of malignant lesions since they are considered as high-risk lesions. Furthermore, 8 benign breast lesions, with imaging characteristics suggesting the diagnosis of fibroadenoma, were stable in a follow-up of at least two years.
The final diagnosis of these lesions is described in Tables 2  and 3. Molecular subtypes were determined according to the St. Gallen International Expert Consensus 2013 (50).

DCE-MRI Radiomics Predicts Lesion Malignancy
Following the steps in Fig. 2 (more details are provided in Section S1 'Calibration and Pre-processing Details'), the highest number of robust features was obtained using 16 bins in the gray level quantization (Supplementary Table S1). Therefore, out of the initial 107 features the number of highly robust features was 84. The intrinsic dependence analysis between these radiomic features and MRI acquisition  parameters À based on a Spearman correlation analysis À showed 72 interdependent features. Afterwards, the nearzero variance analysis did not exclude any feature, while 19 non-redundant features were found. These steps are summarized in Supplementary Table S2. These remaining 19 features were considered as the set to be fed to the FS methods to rank them and, successively, train/test the SVM. Supplementary Sections S3 'Feature Selection Methods' provides the list of all FS methods, while Table S3 reports the final set of 19 features selected after the calibration and pre-processing phase.
Among the investigated FS methods, the ones obtaining better performance were: UDFS, DGUFS, and UFSOL. Table 4 provides the classification metrics obtained in the discovery phase by the best three predictive models (which used an SVM and the three FS methods previously described).
After the choice of the best FS and fitted model the relevant features were used to retrain the SVM model and perform hold-out testing. The final signatures for each of the three best FS methods À which were used in the external test À were composed of the relevant radiomic features listed in Fig. 5. Table 5 provides the classification metrics obtained by the best three predictive models in the hold-out testing phase cohort (see Supplementary Material S5 'Relevant Features and Radiomic Signatures').

DISCUSSION
Various breast cancer predictive radiomic models were builtup by using different quantitative radiomic features extracted from MRI sequences, showing promising results, with the goal of predicting the lesion malignancy in a non-invasive way (11). Nevertheless, different breast MRI protocols, as well as a wide spectrum of lesion segmentation and feature extraction methods, have been proposed so far. DCE-MRI is the most common MRI technique used to characterize breast lesions relying upon both morphologic and hemodynamic features (51À53). Although circumscribed masses are generally suggestive of benign lesions and non-circumscribed masses are suspicious for carcinoma, margin analysis is highly dependent on the spatial resolution, thus low spatial resolution scans could affect margin evaluation, in particular for small masses. Furthermore, although benign lesions generally follow persistent curves, as well as many malignant lesions follow "wash-out" curves, a "plateau" curve can be observed with both benign and malignant lesions; therefore, there is often an overlap between the kinetic curves that depict malignant and benign lesions (24).
In this study, we developed a predictive model based on 3D radiomic features extracted from DCE-MRI sequences (using only the strongest enhancement phase), focusing on breast masses with enhancement, aiming at predicting breast lesion malignancy. After careful pre-processing steps (including feature robustness and redundancy analyses), we used the remaining features to build radiomic signatures, investigating three feature-ranking methods (UDFS, DGUFS and UFSOL) and one classifier (SVM). Good performance for both ranking methods and classifiers used in our work were reported in previous studies (54,55). 3 radiomic signatures were defined, with the best performance in terms of AUROC and specificity achieved by the FS method UDFS coupled with the SVM classifier (0.725 §0.091 and 0.741 § 0.114, respectively), while and the highest sensitivity and NPV values was obtained by UFSOL+SVM (0.796 §0.128 and 0.777 §0.125, respectively). All MRI series involved in our study were acquired at the same institution with the same imaging protocol, with the result of reducing the variability in image acquisitions. We analyzed a homogeneous dataset that allowed us to carefully process and assess the extracted radiomic features, thus increasing the result reliability, as well as the model generalization on external patient cohorts. Despite the limited number of clinical cases, an in-depth and careful analysis was proposed to build radiomic models that predict malignant lesions in patients with breast cancer. Our study was conducted in accordance with (56), where the generalization abilities of radiomic models in multi-centric MRI datasets have been recently addressed. It is worth noting that the use of breast lesion classification based on the ACR BI-RADS lexicon also facilitated the sample standardization of the enrolled cases (8).
Considering that we analyzed only the strongest DCE-MRI phase, thus enabling an efficient and less protocoldependent approach compared to DCE pharmacokinetic modeling, our results are consistent with recent literature studies. In particular, Whitney et al. (57) achieved an average AUROC of 0.846 (95% confidence interval: 0.808À0.875) by using FS from all the features and exploiting pharmacokinetic modeling (58). Nevertheless, the multiparametric MRI-based radiomic model developed by Zhang et al. (59) demonstrated higher diagnostic ability for differentiating benign and malignant breast lesions (AUROC = 0.921), increasing the discriminating power of radiomic features  (60), which also included radiomic feature mapping of T2w, T1w, DWI, and ADC map images. These two works (ie, (59), (60)) analyzed different types of MR sequences, thus introducing further processing and dependence on the MRI acquisition protocols. Unlike these literature works, where multiple sequences were used, our main objective was to develop a radiomicspowered model to be integrated into the clinical practice by exploiting standard-of-care MRI and reducing the required image pre-processing (ie, normalization and quantitative imaging map generation). Therefore, considering standardof-care imaging only, the dependency on the scanner/protocols is minimized and the full potential of the strongest enhancement phase DCE-MRI (as suggested by Teruel et al. (28) for clinical/pathological response prediction) was drawn by relying upon robust radiomics analyses. By doing so, our model might be suitably deployed onto other institutions without requiring specific pre-processing. As a matter of fact, our radiomic model achieved an AUROC of 0.725 §0.091 comparable to the results presented in Zhang et al. (59) for the T1-weighted imaging input (AUROC = 0.730) in the differentiation of benign and malignant breast lesions using SVMs. It is worth noting that a direct comparison between the 2 contributions is not fair since the analyzed datasets are different; moreover, we used a conservative training/testing strategy aimed at future multicenter studies (ie, a nested cross-validation as recently shown in (56) and several feature selection methods to avoid over-optimistic results despite the relatively limited sample size. The combination of radiomic features extracted from different types of sequences could increase the predictive power of the model developed for the problem at hand. Our work primarily aims at clinical feasibility and, for this reason, we analyzed only the subtracted images (obtained from two DCE-MRI time-points), which currently represent the routine examination in breast cancer patients (61,62).
Other recent studies demonstrated that peritumoral tissue inclusion during segmentation led to higher accuracy compared to tumor alone (63). Our analysis focused on extracting radiomic features from a single post-contrast phase. Furthermore, providing the classifier with supplementary information derived from texture features of the first and later postcontrast phases, in a dynamic manner, could potentially improve the performance of lesion characterization, as determined in other studies (64). Analyzing features selected in the three optimal radiomic signatures built-up in our study, 3DS features (least axis length, flatness and elongation) were prevalent and occurred in all signatures, joint energy was the only GLCM feature selected by 2 out of 3 radiomic models, whereas Short Run Emphasis (SRE) and Long Run Low Gray Level Emphasis (LRLGLE) were the two GLRLM features selected by UDFS and DGUFS, respectively. Therefore, Size-Zone NonUniformity, a GLSZM feature, was employed only in the UFSOL-derived signature.
According to previous research, our results confirm that morphological features, reflecting information about the whole lesion shape, may be used for differential diagnosis in radiomic models (65À67). On the contrary, in the best performing radiomic models built-up in other studies, such as in (59), shape features accounted for a small proportion of the features in all models, and they were not selected in some models.
Joint Energy, a measure of homogeneous patterns in the image, is a GLCM feature selected by two of our optimal radiomic models. Early work utilized parameters calculated from the GLCM to discriminate between benign and malignant lesions (68,69). The benign or malignant nature of a lesion can be also inferred via its homogeneous or heterogeneous enhancement appearance, and prior studies have shown that such GLCM-derived features can be used to characterize breast lesions with high diagnostic accuracy (69À71).
In our study, among the GLRLM features, LRLGLE and SRE were included in two radiomic signatures. By evaluating the robustness of radiomic features in MRI, Cattell et al. (72) showed that the GLRLM features were found to have moderate robustness (0.5 À0.9). Moreover, in the study conducted by Gibbs et al. (68), both these GLRLM features, LRLGLE and SRE, demonstrated significant differences in the differentiation between benign and malignant lesions.
According to a very recent meta-analysis conducted by Zhu et al. (73), the characterization of breast masses on DCE-MRI alone showed high sensitivity and AUROC (0.95 and 0.92, respectively) whereas specificity remained lower (0.71). Radiomic models could be capable of overcoming this issue: two of our radiomic signatures showed slightly better performance in terms of specificity, thus further efforts are needed to optimize these results.
Considering also the recent increased evidence of deposition of gadolinium in the brain (74), the development of contrast-free examinations, even of comparable accuracy, appears to be highly attractive. The combination of breast multiparametric MRI radiomics with unenhanced breast MRI protocols is promising, as it may develop into a tool to decrease user-dependence of interpretation. In particular, quantitative DWI showed a higher specificity to differentiate between benign and malignant breast lesions compared to DCE-MRI (75,76).
The limitations of our study are listed in what follows: the analyzed dataset was collected from a single center, and an additional external validation of the proposed model is required; the lesion ROIs were manually annotated by a single observer, and then checked by an experienced reader, helping to further increase the reliability of the delineation process; this was a patient-driven rather than a lesion-driven study: in patients with multi-focal breast lesions only the largest lesion was considered. We assumed that, enrolling multiple lesions of the same patient as separate lesions, a practice often used to increase the available samples, would have introduced potentially a redundancy of data (belonging to the same person) into the dataset, from which the predictive model must then be built, affecting the capabilities of the model.
As future directions, we are planning to extend our singlecenter study to a larger patient cohort and also to non-mass enhancement breast lesions. Moreover, a multicentric study would allow us to assess the generalization abilities of the developed predictive model. Lastly, prospective studies are required to explore how these radiomics-powered predictive models could be deployed into the clinical practice for the characterization of breast masses in DCE-MRI. Complementing studies on BI-RADS 4 and 5 characterization (77,78), analyzing BI-RADS 3 lesions by an ad-hoc method would be clinically relevant to avoid unnecessary biopsies and resulting in an optimization of the current patient management.