Advertisement

An Exploratory Multi-reader, Multi-case Study Comparing Transmission Ultrasound to Mammography on Recall Rates and Detection Rates for Breast Cancer Lesions

Open AccessPublished:December 03, 2020DOI:https://doi.org/10.1016/j.acra.2020.11.011

      Background

      Three-dimensional Quantitative Transmission (QT) ultrasound imaging is an emerging modality for improving the detection and diagnosis of breast cancer. QT ultrasound has high resolution and high contrast to noise ratio, making it effective in evaluating breast tissue. This study compares radiologists’ performance of noncancer recall rates and lesion detection rates using QT Ultrasound versus full-field digital mammography (FFDM) in a cross section of female subjects.

      Materials and Methods

      In this multi-reader multi-case (MRMC) study, we examined retrospective data from two clinical trials conducted at five sites. All subjects received FFDM and QT scans within 90 days. Data were analyzed in a reader study with full factorial design involving 22 radiologists and 108 breast cases (42 normal, 39 pathology-confirmed benign, and 27 pathology-confirmed cancer cases). The main results used a random-reader random-case analysis adjusted for location bias performed after a primary predefined random-reader fixed-case analysis.

      Results

      The readers’ mean rate of detecting lesions of any type was 4% higher (p-value > 0.05) with QT imaging. The mean non-cancer recall rate improved significantly, showing a decrease of 16% with QT (p-value = 0.03), at the expense of a 2% decrease in the mean cancer recall rate (p-value >0.05) in comparison to FFDM. Combining performance on cancer and noncancer recall rates, the mean area under the receiver operator curve of confidence scores improved significantly by 10% with QT (p-value = 0.01).

      Conclusion

      This MRMC study indicates that QT improves non-cancer recall rates without substantially affecting cancer recall rates. The main limitation is the small number of cases from retrospective data. A larger prospective MRMC study is warranted for further assessment.

      Key Words

      INTRODUCTION

      Although two-dimensional (2D) full-field digital mammography (FFDM) is used to screen for breast cancer, the diagnostic accuracy is limited by superimposition effects inherent to projection imaging. Overlying fibroglandular tissue can mask a cancer, decreasing the sensitivity for breast cancer detection particularly for women with dense breasts (
      • Destounis S
      • Johnston L
      • Highnam R
      • et al.
      Using volumetric breast density to quantify the potential masking risk of mammographic density.
      ,
      • Holland K
      • van Gils CH
      • Mann RM
      • et al.
      Quantification of masking risk in screening mammography with volumetric breast density maps.
      ). It can also cause summation artifacts, resulting in false-positive findings and more recalls, leading to additional studies and/or procedures (
      • Cohen EO
      • Tso HH
      • Phalak KA
      • et al.
      Screening mammography findings from one standard projection only in the era of full-field digital mammography and digital breast tomosynthesis.
      ).
      Digital breast tomosynthesis (DBT), a form of limited-angle tomography, was developed to reduce the issues caused by superimposed breast tissue observed in conventional mammography. The ability of DBT to view the breast in slices reduces breast tissue overlap, thus potentially revealing lesions that would have otherwise been missed. Previous studies have demonstrated an increase in cancer detection rates and reduction in recall rates using DBT as an adjunct to FFDM compared to FFDM alone, for both screening and diagnostic purposes, even in women with dense breasts (
      • Skaane P.
      Breast cancer screening with digital breast tomosynthesis.
      • Lang K
      • Andersson I
      • Rosso A
      • et al.
      Performance of one-view breast tomosynthesis as a stand-alone breast cancer screening modality: results from the Malmo Breast Tomosynthesis Screening Trial, a population-based study.
      ). The limitations of DBT include relatively longer interpretation times, higher costs for patients and for medical facilities needing larger imaging storage capacities, and increased radiation dose compared to FFDM (
      • Bernardi D
      • Macaskill P
      • Pellegrini M
      • et al.
      Breast cancer screening with tomosynthesis (3D mammography) with acquired or synthetic 2D mammography compared with 2D mammography alone (STORM-2): a population-based prospective study.
      ,
      • Skaane P
      • Bandos AI
      • Gullien R
      • et al.
      Comparison of digital mammography alone and digital mammography plus tomosynthesis in a population-based screening program.
      ,
      • Lang K
      • Andersson I
      • Rosso A
      • et al.
      Performance of one-view breast tomosynthesis as a stand-alone breast cancer screening modality: results from the Malmo Breast Tomosynthesis Screening Trial, a population-based study.
      ,
      • Destounis SV
      • Morgan R
      • Arieno A
      Screening for dense breasts: digital breast tomosynthesis.
      ). Many of these issues have been improved upon over the years so that DBT is now common practice, however recent studies are finding that DBT does not improve the sensitivity for detecting cancers in women with dense breasts (
      • Lowry KP
      • Coley RY
      • Miglioretti DL
      • et al.
      Screening performance of digital breast tomosynthesis vs digital mammography in community practice by patient age, screening round, and breast density.
      ).
      To offset the limitations of mammography, other modalities such as Magnetic Resonance Imaging (MRI) and Hand-Held Ultrasound (HHUS) have been used as secondary or adjunctive screening modalities. MRI is recommended for screening women with a 20% or greater lifetime risk of breast cancer due to known risk factors, such as a family history of breast cancer, certain inherited genetic alterations, or high breast density (
      • Monticciolo DL
      • Newell MS
      • Moy L
      Breast Cancer screening in women at higher-than-average risk: recommendations from the ACR.
      ). Studies have shown MRI to be a very sensitive method for detecting invasive breast cancer and comparable to FFDM for detecting ductal carcinoma in situ, but MRI has also demonstrated lower specificity with relatively higher false positive and biopsy rates than screening mammography (
      • Heywang-Kobrunner SH
      • Hacker A
      • Sedlacek S
      Magnetic resonance imaging: the evolution of breast imaging.
      ). In addition, MRI requires administration of intravenous contrast agents, which can be associated with the deposition of heavy metal elements in the brain and elsewhere in the body, is expensive, is not suitable for patients with implanted devices or claustrophobia and is not available to all women, particularly those in low-income areas. Thus, it is not considered appropriate for screening a general population (
      • Heywang-Kobrunner SH
      • Hacker A
      • Sedlacek S
      Magnetic resonance imaging: the evolution of breast imaging.
      ,
      • Chetlen A
      • Mack J
      • Chan T
      Breast cancer screening controversies: who, when, why, and how?.
      ).
      In addition to MRI, HHUS, and Automated Breast Ultrasound (ABUS) systems have been used to improve screening for breast cancer. HHUS in breast imaging has shown to improve cancer detection rates, especially in women with dense breast tissue (
      • Hooley RJ
      • Greenberg KL
      • Stackhouse RM
      Screening US in patients with mammographically dense breasts: initial experience with Connecticut Public Act 09-41.
      ,
      • Corsetti V
      • Ferrari A
      • Ghirardi M
      • et al.
      Role of ultrasonography in detecting mammographically occult breast carcinoma in women with dense breasts.
      ,
      • Corsetti V
      • Houssami N
      • Ferrari A
      • et al.
      Breast screening with ultrasound in women with mammography-negative dense breasts: evidence on incremental cancer detection and false positives, and associated cost.
      ,
      • Corsetti V
      • Houssami N
      • Ghirardi M
      • et al.
      Evidence of the effect of adjunct ultrasound screening in women with mammography-negative dense breasts: interval breast cancers at 1 year follow-up.
      ). Studies have shown additional cancers detected in adjunctive screening settings, however, the improved cancer detection has happened at the expense of an increase in false positives (
      • Berg WA
      • Blume JD
      • Cormack JB
      • et al.
      Combined screening with ultrasound and mammography vs mammography alone in women at elevated risk of breast cancer.
      ). Moreover, the operation of HHUS is highly operator dependent resulting in a lack of standardization and integration of HHUS in the breast screening paradigm (
      • Berg WA
      • Blume JD
      • Cormack JB
      • et al.
      Training the ACRIN 6666 Investigators and effects of feedback on breast ultrasound interpretive performance and agreement in BI-RADS ultrasound feature analysis.
      ). Automated ultrasound systems like ABUS can be difficult to integrate into practices as patients often need to be recalled to evaluate findings seen on the views performed after the patient has left the clinic.
      While FFDM, HHUS, and MRI constitute a majority of the imaging performed in breast screening and diagnostic workup, other imaging modalities have also made their way into research and limited clinical use. Dedicated breast CT has been studied over the past few years (
      • Wienbeck S
      • Lotz J
      • Fischer U
      Review of clinical studies and first clinical experiences with a commercially available cone-beam breast CT in Europe.
      ). Additionally, other CT based multimodality based systems (SPECT-CT and PET-CT) have also been developed, however, the use of these systems in the clinic remains limited, given the potential radiation safety issues for screening in younger women (
      • Shah JP
      • Mann SD
      • McKinley RL
      • et al.
      Implementation and CT sampling characterization of a third-generation SPECT-CT system for dedicated breast imaging.
      ,
      • Bowen SL
      • Wu Y
      • Chaudhari AJ
      • et al.
      Initial characterization of a dedicated breast PET/CT scanner during human imaging.
      ).
      As a result, there is a clinical need for a breast imaging modality that has high sensitivity and specificity, is safe and affordable, and can be used in young, high risk women who currently do not receive mammography due to radiation effects. To address this need, an automated, fully tomographic ultrasound modality, quantitative transmission (QT) ultrasound has been developed. QT is a novel imaging modality shown to have high resolution and high contrast to noise ratio, providing anatomic details as well as information about tissue characteristics using the speed of sound data, thus improving diagnostic accuracy (

      Klock J, Iuanow E, Smith K, et al. Visual grading assessment of quantitative transmission ultrasound compared to digital X-ray mammography and hand-held ultrasound in identifying ten breast anatomical structures 2017.

      ,

      QT Ultrasound website Available from: www.qtultrasound.com.

      ). A photograph of the QT scanner is shown in Figure 1 and representative QT images and comparison with other modalities are shown in Figure 2. Additionally, QT can use imaging biomarkers to differentiate between benign and malignant masses (
      • Natesan R LS
      • Navarro D
      • Anaje C
      • et al.
      Radiomics in Transmission Ultrasound Improve Differentiation between Benign and Malignant Breast Masses.
      ), which help improve the accuracy and performance. In comparison to HHUS which is based on reflection B mode data only and assumes a constant speed of sound throughout the tissue, QT imaging uses transmitted ultrasound waves through the breast alongside reflected ultrasound output rendering improved accuracy of breast imaging. In addition, the scan is automated, resulting in consistent and reproducible imaging. The projection data acquired by the scanner is used in image reconstruction algorithms which are based on inverse scattering and fully account for the three-dimensional (3D) nature of acoustic wave propagation. The result of image reconstruction is the generation of coregistered 3D image volumes of reflection, and speed-of-sound maps that together can be used to quantitatively identify breast tissue types (
      • Klock JC
      • Iuanow E
      • Malik B
      • et al.
      Anatomy-correlated breast imaging and visual grading analysis using quantitative transmission ultrasound.
      ,
      • Malik B
      • Klock J
      • Wiskin J
      • et al.
      Objective breast tissue image classification using Quantitative Transmission ultrasound tomography.
      ,
      • Malik B
      • Terry R
      • Wiskin J
      • et al.
      Quantitative transmission ultrasound tomography: Imaging and performance characteristics.
      ). While there has been considerable work performed in describing the theory behind imaging reconstruction for transmission ultrasound, and characterizing QT imaging performance, there have been no reader studies reported that compare radiologists’ performance of noncancer recall rates and lesion detection rates for transmission ultrasound and FFDM. In this study, we compared the performance of QT and FFDM for breast cancer detection in a select population of women enrolled for mammography and QT ultrasound.
      Figure 2
      Figure 2(A) Representative speed-of-sound image (right) and the corresponding whole breast H&E section (left) showing correlation of glandular and ductal histology with the speed-of-sound image; (B) Representative image of a mature apocrine cyst showing the cyst membrane and the internal cyst contents; (C) representative images of fibroadenomas showing internal cellular detail; (D) QT image of an invasive ductal carcinoma showing internal detail; (E) MRI image of the same mass as in D.

      MATERIALS AND METHODS

      Study Subjects

      In this study, we directly compared detection rates for all lesions and recall rates for transmission ultrasound with those for FFDM. We retrospectively examined data collected from two HIPAA-compliant, institutional review board-approved case collection clinical trials conducted between 2006 and 2018 at the following five sites: UC San Diego Health, Mayo Clinic, Long Beach Medical Center, Marin Breast Health Trial Center, and Universitatsklinikum Freiburg University Hospital. Within a 90-day period, participating adult female subjects received both a standard FFDM and a transmission ultrasound scan (QT Ultrasound Breast Scanner, QT Ultrasound, Novato, California) in the case collection study where cases were collected when a mammographic abnormality was seen on at least one view of the mammogram and subsequent QT imaging was performed. A team of three board certified breast radiologists adjudicated all mammograms and QT images for image quality and colocation of breast masses. Normal mammograms were also enrolled with their corresponding QT ultrasound scans. The transmission ultrasound scan was performed by placing the subject's breasts in the scan tank of the scanner, one breast at a time, where the breast was positioned in a water bath. A 360-degree data acquisition occurred around the breast with a curvilinear array with the array moving up 2 mm following a 360-degree rotation around the breast where the entire breast was imaged from the nipple to the pectoralis muscle of the chest wall. The images were reconstructed and viewed in a proprietary software, QTviewer®.

      Image Interpretation and Analyses

      The FFDM and QT breast imaging data were analyzed in an exploratory multi-reader multi-case (MRMC) study with full factorial design involving 22 radiologists and 108 breast cases, including 42 normal, 39 pathology-confirmed benign (18 cyst and 21 benign solid space-occupying lesions), and 27 pathology-confirmed cancer cases. The cases were collected when a mammographic abnormality was seen on mammography and subsequent QT imaging was performed in the diagnostic setting; however, since mammography misses a substantial number of QT-identified masses, cases with negative mammograms were also added to control for bias against QT resulting from “only positive mammography” entry criteria. Breast cases were excluded if the image sets were not properly processed due to image reconstruction algorithm failure, were incomplete, or showed a clip or marker. Cases were also excluded if the one-year follow-up mammograms (normal cases) or pathology reports (benign and cancer cases) were not available to establish the ground truth; the ground-truth lesions were outside the QT field of view; or the location of the biopsy results were not concordant with both QT and FFDM. A patient flow chart is provided in Figure 3.
      Figure 3
      Figure 3Patient flow chart depicting the number of patients included and excluded and the distribution between normal cases and cases with benign and cancer findings.
      The 22 participating readers were all board-certified diagnostic radiologists from academic and nonacademic institutions with a variety of experience, ranging from breast-fellowship trained imagers to general radiologists. Out of 22 readers, 12 readers were breast-fellowship trained radiologists; 10 readers had 1–10 years’ experience, 4 readers had 11–20 years’ experience, and 8 readers had greater than 20 years’ experience. Sixteen of the 22 readers worked in private practice, four in academia, and two in community practices. Prior to the study, all readers successfully completed a standard QT and FFDM reader training program, which included instruction on the operation and functionality of the standardized QT and FFDM workstations used to review all QT and FFDM images, respectively.
      The readers were blinded to all diagnostic reports, prior imaging, and contralateral breast imaging. To avoid memory effects, they interpreted all of the QT breast cases in a randomized fashion on a single day, and similarly interpreted all of the FFDM cases in a randomized order the following day. For each case and imaging modality, the readers were asked to mark up to three findings. Instances of marking multiple findings was relatively rare and only seven readers commented on multiple findings: five readers marked two findings and two readers marked three findings. An adjudicator assessed whether any of these potential findings correctly located a ground truth lesion to assess the detection rate for lesions. In addition, the readers were asked to recommend recall or no-recall, as well as provide a confidence score between 0 and 100 on their decision. For this study, a no-recall was recommended for QT for normal cases and benign cysts, whereas the presence of any solid space-occupying lesions required a recall (
      • Iuanow E
      • Smith K
      • Obuchowski NA
      • et al.
      Accuracy of cyst versus solid diagnosis in the breast using Quantitative Transmission (QT) ultrasound.
      ). For FFDM, a no-recall was recommended for normal cases only; the presence of any space-occupying lesion required a recall in agreement with standard of care (

      Sickles E, D'Orsi C, Bassett L, et al. ACR BI-RADS® Mammograph. 2013. In: ACR BI-RADS® Atlas, Breast Imaging Reporting and Data System [Internet]. Reston, VA: American College of Radiology.

      ). These recall criteria were guided by the desire to compare the performance of FFDM and QT without having priors to compare.

      Statistical Analyses

      The data were analyzed for the entire cohort of 108 breast cases (42 normal, 39 pathology-confirmed benign, and 27 pathology-confirmed cancer cases) using two general approaches: a random-reader fixed-cases (RRFC) analysis and a random-readers random cases (RRRC) analysis. RRFC analysis generalizes to the population of readers, but is specific to the particular case set and is termed random-reader fixed-cases analysis. In comparison, RRRC analysis generalizes both the case set and the population of readers. The RRRC analysis was expected to provide results more generalizable to new readers reading new cases, but with wider confidence levels compared to the RRFC analysis.
      For both approaches, performance comparisons between QT and FFDM were summarized in terms of mean differences between readers and 95% confidence intervals (CI) for these differences with p-values determining the degree of statistical significance. The performance metrics included noncancer and cancer recall rates and detection rates for all lesions. In addition, we analyzed the mean area under the receiver operator curve (ROC-AUC) based on the readers’ confidence scores as a statistically efficient approach to evaluating the cancer and noncancer performance metrics combined into a single measurement.
      These analyses were performed according to the method of Obuchowski & Rockette with Hillis adjustment to the degrees of freedom (
      • Obuchowski NA
      • Rockette HE.
      Hypthesis testing of diagnostic accuracy for multiple readers and multiple tests: an ANOVA approach with dependent observations.
      ,
      • Hillis SL
      • Obuchowski NA
      • Schartz KM
      • et al.
      A comparison of the Dorfman-Berbaum-Metz and Obuchowski-Rockette methods for receiver operating characteristic (ROC) data.
      ). The RRRC analysis of ROC-AUC was performed with the software package ORDBM MRMC 2.5, written by Stephen L, Kevin M. Schwartz, and Kevin S. Berbaum (

      Hillis SL, Schartz KM, Berbaum KS. OR-DBM MRMC 2.5 User Guide. University of Iowa; 2014;4–14.

      ). The trapezoidal/Wilcoxon method for curve fitting and jackknifing for the covariance estimation were used in the analysis. All other statistical analyses were performed in the statistical computing environment R version 3.4.0 or higher (
      • Core Team R
      R: A language and environment for statistical computing.
      ). No statistical adjustments were made for multiple analyses.
      The ground truth was established by one-year follow-up mammogram results for the normal cases and pathology results for the benign and cancer cases. All RRFC and RRRC results were adjusted post-hoc for location bias, considering recalls as correct only when the decisions were based on the correct ground-truth lesions. This adjustment is indicated because the severity of location bias is dissimilar for the two imaging modalities (
      • McGowan LDA
      • Bullen JA
      • Obuchowski NA
      Location bias in ROC studies.
      ). Therefore, we adjusted for location bias to avoid favoring the modality with higher false-positive rates.

      RESULTS

      From the reader study comparing QT imaging and FFDM, overall improvement was noted with QT imaging in all the defined metrics including noncancer recall rate, ROC-AUC, and rate of detecting lesions of any type. QT imaging showed a slight, but statistically insignificant decrease in cancer recall rate in comparison to FFDM discussed below.
      After appropriately adjusting for location bias, the 22 readers’ mean non-cancer recall rate was 16% lower with QT than FFDM with a 95% confidence interval of (−0.20,−0.12), p-value <0.01 in the RRFC analysis and (−0.24, −0.08), p-value = 0.03 in the RRRC analysis for all 108 breast cases, as shown in Table 1 and Figure 4. The readers’ mean cancer recall rate was 2% lower with QT than FFDM with a 95% confidence interval of (−0.07, 0.04), p-value >0.05 in the RRFC analysis and (−0.16, 0.13), p >0.05 in the RRRC analysis, as shown in Table 1.
      Table 1Comparison of Recall Rates With QT and FFDM for Different Subgroups of Pathologies
      SubgroupCountQT recall rateFFDM recall rateDifference in recall rates with [95% CI]p-valueRelative Difference:QT-FFDM
      Non-cancer810.310.47-0.16 [-0.20, -0.12]< 0.01*-34.0%
      Benign mass210.620.68-0.06 [-0.10, -0.02]<0.01*-8.8%
      Cyst180.210.66-0.44 [-0.51, -0.38]<0.01*-68.2%
      Benign lesions390.430.67-0.24 [-0.28, -0.20]<0.01*-35.8%
      Normal420.190.27-0.08 [-0.15, -0.01]0.02*-29.6%
      Cancer270.60.61-0.02 [-0.07, 0.04]>0.05-1.6%
      The count represents the sample size for that given subgroup. QT: transmission ultrasound; FFDM: full-field digital mammography. The relative difference is calculated as percentage relative change between the QT and FFDM recall rates. Negative value of the relative difference indicates reduction in recall rates with QT. * denotes statistically significant difference. Non-cancer includes benign mass, benign lesions and normal cases. Benign lesions group includes benign mass and cyst.
      Figure 4
      Figure 4Overall performance of the radiologists as measured in terms of recall rates for QT and FFDM. The horizontal axis marks the difference between the recall rates of QT and FFDM. A lower than zero value indicates QT shows improvement over FFDM; a greater than zero value indicates FFDM shows improvement over QT. Square (▪) markers correspond to RRRC results; circle (•) markers correspond to RRFC results.
      The overall accuracy of these recall decisions was also analyzed using the combined metric of ROC-AUC based on the readers’ confidence scores. The readers’ mean ROC-AUC was 10% higher with QT than FFDM with a 95% confidence interval of (0.07, 0.13), p-value <0.01 in the RRFC analysis and (0.02, 0.18), p-value = 0.01 in the RRRC analysis for all 108 breast cases, as shown in Table 2 and Figure 5.
      Table 2Comparison of AUC of QT and FFDM for different Subgroups Of Pathologies
      Subgroup comparisonCountQT AUCFFDM AUCDifference in AUC values with [95% CI]p-valueRelative Difference: QT-FFDM
      All vs Cancer1080.690.590.10 [0.07, 0.13]<0.01*16.9%
      Benign Mass vs Cancer480.50.450.04 [0.02, 0.07]<0.01*11.1%
      Cyst vs Cancer450.740.480.26 [0.22, 0.31]<0.01*54.2%
      Normal vs Cancer690.760.70.06 [0.02, 0.10]<0.01*8.6%
      The count represents the sample size for that given subgroup. QT: transmission ultrasound; FFDM: full-field digital mammography; AUC: area under the curve. The relative difference is calculated as percentage relative change between the QT and FFDM AUC values. Positive value of the relative difference indicates increase in the AUC with QT. * denotes statistically significant difference. Non-cancer includes benign mass, benign lesions and normal cases. Benign lesions group includes benign mass and cyst.
      Figure 5
      Figure 5Overall performance of the radiologists as measured in terms of ROC-AUC for QT and FFDM. The horizontal axis marks the difference between the AUC of QT and FFDM. A greater than zero value indicates QT shows improvement over FFDM; a lower than zero value indicates FFDM shows improvement over QT. Square (▪) markers correspond to RRRC results; circle (•) markers correspond to RRFC results.
      Finally, the readers’ mean rate of detecting lesions of any type was 4% higher with QT than FFDM with a 95% confidence interval of (0.01, 0.08), p-value = 0.02 in the RRFC analysis and (−0.06, 0.15), p-value > 0.05 in the RRRC analysis for the entire cohort of 108 breast cases, as shown in Table 3 in Figure 6.
      Table 3Comparison of Detection Rates For Any Type Of Lesions With QT and FFDM for Different Subgroups Of Pathologies
      SubgroupCountQT detection rateFFDM detection rateDifference in detection rates with [95% CI]p-valueRelative Difference:QT-FFDM
      All lesions690.710.660.04 [0.01, 0.06]<0.01*7.6%
      Benign mass210.740.710.03 [-0.00, 0.07]>0.054.2%
      Cyst200.770.660.10 [0.07, 0.14]<0.01*16.7%
      Benign lesions410.750.690.07 [0.04, 0.09]<0.01*8.7%
      Cancer280.640.630.01 [-0.04, 0.06]>0.051.6%
      The count represents the sample size for that given subgroup. QT: transmission ultrasound; FFDM: full-field digital mammography. The relative difference is calculated as percentage relative change between the QT and FFDM detection rate values. Positive value of the relative difference indicates increase in the detection rate with QT. * denotes statistically significant difference. Non-cancer includes benign mass, benign lesions and normal cases. Benign lesions group includes benign mass and cyst.
      Figure 6
      Figure 6Overall performance of the radiologists as measured in terms of detection rates for any type of lesions with QT and FFDM. The horizontal axis marks the difference between the detection rates with QT and FFDM. A greater than zero value indicates QT shows improvement over FFDM; a lower than zero value indicates FFDM shows improvement over QT. Square (▪) markers correspond to RRRC results; circle (•) markers correspond to RRFC results.

      DISCUSSION

      Because the scope and cost of conventional prospective screening studies in breast cancer are substantial, and because the incidence of breast cancer in young women is extremely low, making population-based screening impossible, we have discussed with the FDA how to design smaller trials with creative design that could overcome the obstacles imposed by large-scale population screening. In support of this approach, due to its unique safety profile and high image quality, and the ability to image dense breasts, the technology is being pursued under the FDA's Breakthrough Device Designation of the 21st Century Cures Act as potentially useful for screening younger women who do not qualify for x-ray mammography (). This work is the first study assessing performance of transmission ultrasound for primary breast cancer screening when compared with mammography. The goal for this research is to gather evidence for its effectiveness for this novel indication for use and to assess the detection rates for lesions, benign and cancer, and the recall rates for the same.
      In the current study, the majority of the metrics used to quantify readers’ performance with QT imaging compared to FFDM, showed overall improvement with QT imaging. The non-cancer recall rate decreased by 16%, ROC-AUC improved by 10%, rate of detecting lesions of any type improved by 4%. QT imaging showed a slight, but statistically nonsignificant decrease of 2% in the cancer recall rate compared to FFDM. These results suggest that QT imaging shows improved accuracy of recall decisions and lesion detection as compared to FFDM albeit in a small subset of women imaged.
      For the detection of lesions overall, QT was slightly (+4%), but not significantly better (95% CI: [−0.06, 0.15]) than FFDM. A clear improvement was seen for mean noncancer recall rate: −16% with QT (95% CI: [−0.24, −0.08]). As the 95% confidence interval RRRC shows, the improvement is positive also when accounting for case and reader variability. On the other side, the difference in mean cancer recall rate was −2% (CI [−0.16,0.13]), which is not statistically significant and is considerably smaller than the variability from cases and readers as seen in the range between the upper and lower limit of the 95% CI. This poses an interesting question of what is seen when cancer and noncancer rates are meaningfully combined, which we investigated using the mean ROC-AUC. In our study, the ROC-AUC of QT was 10% higher with confidence intervals with a positive lower limit for both analyses, indicating a 95% confidence that QT had a higher recall accuracy than FFDM even when accounting for case variability. Similar trends for the ROC-AUC were seen for both dense and nondense subpopulations. These reported comparisons of QT imaging and FFDM are first-in-class as well as first-in-human results demonstrating the capabilities of transmission ultrasound in breast screening with no direct and relevant published literature with which to compare.
      In an outside meta-analysis of 20 studies, ROC-AUC was shown to be an efficient way to simultaneously capture the performance of a device on cancer and non-cancer cases in MRMC studies across laboratories and a better method for predicting performance in clinical studies than cancer and noncancer recall rates (
      • Samuelson FW
      • Abbey CK.
      The reproducibility of changes in diagnostic figures of merit across laboratory and clinical imaging reader studies.
      ). In line with this, it is the preferred accuracy statistical metric for comparisons of the U.S. Food and Drug Administration (
      • Abdolell M
      • Tsuruda KM
      • Brown P
      • et al.
      Breast density scales: the metric matters.
      ). Taken together, these results indicate that QT improves non-cancer recall rates without substantially affecting cancer recall rates.
      The limitations in the detection of breast masses in FFDM are mainly due to the breast structure since the image is inherently a 2D view making it difficult to discern findings due to superposition of fibroglandular tissue and summation artifacts (
      • Sickles EA.
      Findings at mammographic screening on only one standard projection: outcomes analysis.
      ,
      • Ng KH
      • 20/20 Lau S.Vision
      Mammographic breast density and its clinical applications.
      ). In comparison, the breast structure and the volumetric nature of the fibroglandular tissue is fully resolved in QT imaging and the interpretation is similar in nature to other volumetric imaging modalities such as breast MRI and dedicated breast CT (
      • Ng KH
      • 20/20 Lau S.Vision
      Mammographic breast density and its clinical applications.
      ,
      • Boone JM
      • Nelson TR
      • Lindfors KK
      • et al.
      Dedicated breast CT: radiation dose and image quality evaluation.
      ) which is an important advantage. In addition, both the QT image acquisition and the image reconstruction are inherently 3D, so instead of the generation of concatenated 2D slices in FFDM, QT image synthesis directly results in image volumes and is able to account and compensate for full 3D wave physics, resulting in high contrast and highly reproducible images (
      • Malik B
      • Terry R
      • Wiskin J
      • et al.
      Quantitative transmission ultrasound tomography: Imaging and performance characteristics.
      ,
      • Wiskin JW
      • Borup DT
      • Iuanow E
      • et al.
      3-D nonlinear acoustic inverse scattering: algorithm and quantitative results.
      ). Image acquisition for QT ultrasound takes from 4 to 10 minutes depending on the size of the breast and can be decreased to half those times with software modifications. While this acquisition time is longer than that for mammography, it is considerably less than time required for breast MRI which can take up to an hour for both breasts. QT image interpretation takes less than 2 minutes for an experienced reader, with increases in reading time for complicated cases comparable to other volumetric breast imaging modalities. This is comparable to time required to interpret mammograms and is significantly less in comparison to average interpretation time of 6 minutes for breast MRI (
      • Ko ES
      • Morris EA.
      Abbreviated magnetic resonance imaging for breast cancer screening: concept, early results, and considerations.
      ).
      Another advantage of QT is that the imaging is performed with minimal operator dependence that can allow for standardization of the procedure, in comparison to conventional ultrasound. In clinical practice, HHUS is performed by a technologist, sometimes requiring the radiologist to interrupt their work to re-scan the patient to confirm the technologist's findings. QT imaging and ABUS both have automated image acquisition and hence the image review and interpretation can be performed by the radiologist on a complete image set. QT differs from ABUS in that the images are inherently volumetric and cross-sectional, eliminating the need for re-scanning once proper patient positioning is achieved during scanning. This is critically important in that QT would eliminate the need for patients to return to the clinic for re-evaluation of findings seen on ABUS imaging sets when the scan is read after the patient has departed the clinic. We will study the average time required to interpret the QT images in future studies.
      HHUS imaging has proven important in differentiating solid masses from cysts in the breast (
      • Sickles EA
      • Filly RA
      • Callen PW
      Benign breast lesions: ultrasound detection and diagnosis.
      ). Transmission ultrasound has also been shown to aid in accurately identifying cysts vs solid masses and characterizing cystic components based on speed of sound (
      • Iuanow E
      • Smith K
      • Obuchowski NA
      • et al.
      Accuracy of cyst versus solid diagnosis in the breast using Quantitative Transmission (QT) ultrasound.
      ,
      • Malik BH
      • Klock JC.
      Breast cyst fluid analysis correlations with speed of sound using transmission ultrasound.
      ). In our results, we also saw that QT imaging showed a higher cyst detection rate (17% relative improvement over mammography) and reduced recall rates for cysts (68% relative improvement over mammography) in comparison to other subgroups, indicating improved radiologists’ performance when identifying cysts. In addition, the ROC-AUC relative improvement of 54% was the highest in all subgroup comparison, further highlighting the performance improvements with identifying cysts with QT imaging and eliminating the need for recalls in this subgroup of subjects.
      The clinical implications of the current research are that QT is a candidate for being added to the current breast imaging paradigm with particular application to younger women that do not qualify for X-ray mammography screening, but who need to be screened for medical reasons, for women with dense breasts, for women with breast implants and for women who refuse to have X-ray mammograms because of the radiation risk, for example, women who have had chest wall radiation or for whom X-ray exposure is contraindicated.
      Limitations in our study were the small sample size in a retrospective study and case selection criteria. In particular, breast cases were excluded if either of the imaging datasets were improperly processed due to algorithm failures that resulted in nondiagnostic images, which may have resulted in a bias. Since these cases could not be evaluated using both modalities, it is difficult to determine whether this potential bias would be favorable towards QT or FFDM. Cases were also excluded if the FFDM findings were outside the QT's imaging field of view. Visualization of the breast near the chest wall and axilla were at times limited with the QT scanner used for this study; refinements in the scanner design to increase this field of view are currently underway. Breast tomosynthesis exams were not included in this comparison since many of the cases were collected at a time where tomosynthesis was not widespread. Future studies will include the comparison of QT Ultrasound with 2D mammography and 3D tomosynthesis in a prospective study design.
      Another limitation is that when planning the study, the information on reader and case variability did not allow a sample size calculation for the MRMC RRRC analyses, so the fixed case analysis was used as predefined primary analysis, and the need and appropriateness of the location bias correction was also based on seeing the strength of the bias when analyzing the data.
      Studies comparing radiologist performance in a clinical setting in comparison to that in a reader study show that radiologist performance is significantly better and more consistent in a clinical setting than in a reader study environment (
      • Gur D
      • Bandos AI
      • Cohen CS
      • et al.
      The “Laboratory” Effect: comparing radiologists' performance and variability during prospective clinical and laboratory mammography interpretations.
      ). These observations can have adverse implications on the results assessed on the basis of a reader study. However, such an impact can be reduced when using multiple readers. In the presented study, 22 readers with broad range of radiology expertise interpreted the same sets of both FFDM and QT images were randomized in each modality in varying order, potentially minimizing the bias associated with reading FFDM images in a reader study and more effectively simulating the clinical setting.
      In summary, this preliminary study showed that QT imaging demonstrates improved accuracy of recalls and lesion detection as compared to FFDM. These results show that additional larger prospective studies are needed, but that this novel ultrasound modality appears to be valuable asset in the current breast imaging paradigm and deserves further study. There may also be a specific niche for QT in young, high-risk patients with dense breasts, which is currently being evaluated with the FDA in a breakthrough designation, or in patients where access to care may be an issue.

      COI STATEMENTS

      JK is employee of QT Ultrasound LLC. BM and EI are consultants for QT Ultrasound LLC.

      TRIAL OVERSIGHT STATEMENT

      The sponsor, QT Ultrasound LLC (Novato, California), designed the trial in consultation with Cytel (Cambridge, Massachusetts). All authors participated in the writing of the manuscript and approved the draft that was submitted for publication. The first draft of the manuscript was written by the first and last authors. The trial was conducted in accordance with the provisions of the International Conference on Harmonization Guidelines for Good Clinical Practice and the Declaration of Helsinki.

      References

        • Destounis S
        • Johnston L
        • Highnam R
        • et al.
        Using volumetric breast density to quantify the potential masking risk of mammographic density.
        AJR Am J Roentgenol. 2017; 208: 222-227
        • Holland K
        • van Gils CH
        • Mann RM
        • et al.
        Quantification of masking risk in screening mammography with volumetric breast density maps.
        Breast Cancer Res Treat. 2017; 162: 541-548
        • Cohen EO
        • Tso HH
        • Phalak KA
        • et al.
        Screening mammography findings from one standard projection only in the era of full-field digital mammography and digital breast tomosynthesis.
        AJR Am J Roentgenol. 2018; 211: 445-451
        • Skaane P.
        Breast cancer screening with digital breast tomosynthesis.
        Breast Cancer. 2017; 24: 32-41
        • Bernardi D
        • Macaskill P
        • Pellegrini M
        • et al.
        Breast cancer screening with tomosynthesis (3D mammography) with acquired or synthetic 2D mammography compared with 2D mammography alone (STORM-2): a population-based prospective study.
        Lancet Oncol. 2016; 17: 1105-1113
        • Skaane P
        • Bandos AI
        • Gullien R
        • et al.
        Comparison of digital mammography alone and digital mammography plus tomosynthesis in a population-based screening program.
        Radiology. 2013; 267: 47-56
        • Lang K
        • Andersson I
        • Rosso A
        • et al.
        Performance of one-view breast tomosynthesis as a stand-alone breast cancer screening modality: results from the Malmo Breast Tomosynthesis Screening Trial, a population-based study.
        Eur Radiol. 2016; 26: 184-190
        • Destounis SV
        • Morgan R
        • Arieno A
        Screening for dense breasts: digital breast tomosynthesis.
        AJR Am J Roentgenol. 2015; 204: 261-264
        • Lowry KP
        • Coley RY
        • Miglioretti DL
        • et al.
        Screening performance of digital breast tomosynthesis vs digital mammography in community practice by patient age, screening round, and breast density.
        JAMA Network Open. 2020; 3 (e2011792-e)
        • Monticciolo DL
        • Newell MS
        • Moy L
        Breast Cancer screening in women at higher-than-average risk: recommendations from the ACR.
        Journal of the American College of Radiology: JACR. 2018; 15: 408-414
        • Heywang-Kobrunner SH
        • Hacker A
        • Sedlacek S
        Magnetic resonance imaging: the evolution of breast imaging.
        Breast. 2013; 22 (Suppl): S77-S82
        • Chetlen A
        • Mack J
        • Chan T
        Breast cancer screening controversies: who, when, why, and how?.
        Clin Imaging. 2016; 40: 279-282
        • Hooley RJ
        • Greenberg KL
        • Stackhouse RM
        Screening US in patients with mammographically dense breasts: initial experience with Connecticut Public Act 09-41.
        Radiology. 2012; 265: 59-69
        • Corsetti V
        • Ferrari A
        • Ghirardi M
        • et al.
        Role of ultrasonography in detecting mammographically occult breast carcinoma in women with dense breasts.
        Radiol Med. 2006; 111: 440-448
        • Corsetti V
        • Houssami N
        • Ferrari A
        • et al.
        Breast screening with ultrasound in women with mammography-negative dense breasts: evidence on incremental cancer detection and false positives, and associated cost.
        Eur J Cancer. 2008; 44: 539-544
        • Corsetti V
        • Houssami N
        • Ghirardi M
        • et al.
        Evidence of the effect of adjunct ultrasound screening in women with mammography-negative dense breasts: interval breast cancers at 1 year follow-up.
        Eur J Cancer. 2011; 47: 1021-1026
        • Berg WA
        • Blume JD
        • Cormack JB
        • et al.
        Combined screening with ultrasound and mammography vs mammography alone in women at elevated risk of breast cancer.
        JAMA. 2008; 299: 2151-2163
        • Berg WA
        • Blume JD
        • Cormack JB
        • et al.
        Training the ACRIN 6666 Investigators and effects of feedback on breast ultrasound interpretive performance and agreement in BI-RADS ultrasound feature analysis.
        AJR Am J Roentgenol. 2012; 199: 224-235
        • Wienbeck S
        • Lotz J
        • Fischer U
        Review of clinical studies and first clinical experiences with a commercially available cone-beam breast CT in Europe.
        Clin Imaging. 2017; 42: 50-59
        • Shah JP
        • Mann SD
        • McKinley RL
        • et al.
        Implementation and CT sampling characterization of a third-generation SPECT-CT system for dedicated breast imaging.
        J Med Imaging (Bellingham). 2017; 4033502
        • Bowen SL
        • Wu Y
        • Chaudhari AJ
        • et al.
        Initial characterization of a dedicated breast PET/CT scanner during human imaging.
        J Nucl Med. 2009; 50: 1401-1408
      1. Klock J, Iuanow E, Smith K, et al. Visual grading assessment of quantitative transmission ultrasound compared to digital X-ray mammography and hand-held ultrasound in identifying ten breast anatomical structures 2017.

      2. QT Ultrasound website Available from: www.qtultrasound.com.

        • Natesan R LS
        • Navarro D
        • Anaje C
        • et al.
        Radiomics in Transmission Ultrasound Improve Differentiation between Benign and Malignant Breast Masses.
        in: Radiological Society of North America 2019 Scientific Assembly and Annual Meeting. 2019 (December 1 - December 6 Chicago IL)
        • Klock JC
        • Iuanow E
        • Malik B
        • et al.
        Anatomy-correlated breast imaging and visual grading analysis using quantitative transmission ultrasound.
        Int J Biomed Imaging. 2016; 20167570406
        • Malik B
        • Klock J
        • Wiskin J
        • et al.
        Objective breast tissue image classification using Quantitative Transmission ultrasound tomography.
        Sci Rep. 2016; 6: 38857
        • Malik B
        • Terry R
        • Wiskin J
        • et al.
        Quantitative transmission ultrasound tomography: Imaging and performance characteristics.
        Med Phys. 2018; 45: 3063-3075
        • Iuanow E
        • Smith K
        • Obuchowski NA
        • et al.
        Accuracy of cyst versus solid diagnosis in the breast using Quantitative Transmission (QT) ultrasound.
        Acad Radiol. 2017; 24: 1148-1153
      3. Sickles E, D'Orsi C, Bassett L, et al. ACR BI-RADS® Mammograph. 2013. In: ACR BI-RADS® Atlas, Breast Imaging Reporting and Data System [Internet]. Reston, VA: American College of Radiology.

        • Obuchowski NA
        • Rockette HE.
        Hypthesis testing of diagnostic accuracy for multiple readers and multiple tests: an ANOVA approach with dependent observations.
        Communications in Statistics — Simulation and Computation. 1995; 24: 285-308
        • Hillis SL
        • Obuchowski NA
        • Schartz KM
        • et al.
        A comparison of the Dorfman-Berbaum-Metz and Obuchowski-Rockette methods for receiver operating characteristic (ROC) data.
        Stat Med. 2005; 24: 1579-1607
      4. Hillis SL, Schartz KM, Berbaum KS. OR-DBM MRMC 2.5 User Guide. University of Iowa; 2014;4–14.

        • Core Team R
        R: A language and environment for statistical computing.
        R Foundation for Statistical Computing, Vienna, Austria2018
        • McGowan LDA
        • Bullen JA
        • Obuchowski NA
        Location bias in ROC studies.
        Statistics in Biopharmaceutical Research. 2016; 8: 258-267
      5. Available from: https://www.qtultrasound.com/fda-grants-qt-ultrasound-breakthrough-device-designation/.

        • Samuelson FW
        • Abbey CK.
        The reproducibility of changes in diagnostic figures of merit across laboratory and clinical imaging reader studies.
        Acad Radiol. 2017; 24: 1436-1446
        • Abdolell M
        • Tsuruda KM
        • Brown P
        • et al.
        Breast density scales: the metric matters.
        Br J Radiol. 2017; 9020170307
        • Sickles EA.
        Findings at mammographic screening on only one standard projection: outcomes analysis.
        Radiology. 1998; 208: 471-475
        • Ng KH
        • 20/20 Lau S.Vision
        Mammographic breast density and its clinical applications.
        Med Phys. 2015; 42: 7059-7077
        • Boone JM
        • Nelson TR
        • Lindfors KK
        • et al.
        Dedicated breast CT: radiation dose and image quality evaluation.
        Radiology. 2001; 221: 657-667
        • Wiskin JW
        • Borup DT
        • Iuanow E
        • et al.
        3-D nonlinear acoustic inverse scattering: algorithm and quantitative results.
        IEEE Trans Ultrason Ferroelectr Freq Control. 2017; 64: 1161-1174
        • Ko ES
        • Morris EA.
        Abbreviated magnetic resonance imaging for breast cancer screening: concept, early results, and considerations.
        Korean J Radiol. 2019; 20: 533-541
        • Sickles EA
        • Filly RA
        • Callen PW
        Benign breast lesions: ultrasound detection and diagnosis.
        Radiology. 1984; 15: 467-470
        • Iuanow E
        • Smith K
        • Obuchowski NA
        • et al.
        Accuracy of cyst versus solid diagnosis in the breast using Quantitative Transmission (QT) ultrasound.
        Acad Radiol. 2017; 24: 1148-1153
        • Malik BH
        • Klock JC.
        Breast cyst fluid analysis correlations with speed of sound using transmission ultrasound.
        Acad Radiol. 2019; 26: 76-85
        • Gur D
        • Bandos AI
        • Cohen CS
        • et al.
        The “Laboratory” Effect: comparing radiologists' performance and variability during prospective clinical and laboratory mammography interpretations.
        Radiology. 2008; 249: 47-53