Impact of Concurrent Use of Arti ﬁ cial Intelligence Tools on Radiologists Reading Time: A Prospective Feasibility Study

Rational and Objectives: This study investigated how an AI tool impacted radiologists reading time for non-contrast chest CT exams. Materials and Methods: An AI tool was implemented into the PACS reading work ﬂ ow of non-contrast chest CT exams between April and May 2020. The reading time was recorded for one CONSULTANT RADIOLOGIST and one RADIOLOGY RESIDENT by an external observer. After each case radiologists answered questions regarding additional ﬁ ndings and perceived case overview. Reading times were recorded for 25 cases without and 20 cases with AI tool assistance for each reader. Differences in reading time with and without the AI tool were assessed using Welch ’ s t-test for non-inferiority with non-inferiority limits de ﬁ ned as 100 seconds for the consultant and 200 seconds for the resident. Results: The mean reading time for the radiology resident was not signi ﬁ cantly affected by the AI tool (without AI 370s vs with AI 437s; +67s 95% CI -28s to +163s, p = 0.16). The reading time for the radiology consultant was also not signi ﬁ cantly affected by the AI tool (without AI 366s vs with AI 380s; +13s (95% CI - -57s to 84s, p = 0.70). The AI tool led to additional actionable ﬁ ndings in 5/40 (12.5%) studies and better overview in 18/20 (90%) of studies for the resident. Conclusion: A PACS based implementation of an AI tool for concurrent reading of chest CT exams did not increase reading time with additional actionable ﬁ ndings made as well as a perceived better case overview for the radiology resident.


INTRODUCTION
T he number of commercially available Artificial Intelligence (AI) tools in radiology is rapidly rising. AI tools can assist the radiologist in solving interpretation tasks such as lesion detection, automatic measurement, and decision support. And AI tools have been shown to potentially reduce the reading time for radiologists, especially in the tasks of detecting breast-, lung-and colon-cancer (1)(2)(3)(4)(5) But, the AI tools are often retrospectively investigated in research environments, with no interaction between the AI tool, and the clinical reading environment. In addition, the increasing number of AI tools will lead to multiple AI tools simultaneously affecting radiologists in clinical practice. We therefore wanted to study the influence of multiple AI functionalities on radiologist reading time in a clinical setting.
In this manuscript the term AI tool refers to programs or algorithms, which automatically and independently assess imaging studies and inform the radiologists about findings. One of these AI tools, recently introduced at our institution, allows simultaneous reporting of multiple findings in non-contrast low-dose CT scans of the chest. At our institution, this type of scan is primarily used for follow-up of pulmonary nodules identified on previous CT scans or radiographs of the chest.
AI tools can reduce reading times for CT scans of the chest À but this depends on the implementation of the algorithm. Matsumoto et al found that when AI tools for the detection of CT nodules were used concomitantly to normal reading then radiologist reading times for both residents and senior radiologists were not substantially affected (6) But when using the AI tool for second reading (i.e. the radiologist first assesses the scan without an AI tool), the reading time increased. (5-7) A similar effect has been shown for AI tools for pulmonary embolism detection, where second reading increased the reading time for CT angiography studies by 22 seconds, while concurrent reading reduced the reading time by 12%-17%. (8,9) Recently two smaller studies of concomitant use of AI tools for the detection of corona virus 19 disease also found reduction in average reading times for chest CT studies (10,11) The intended use of the algorithm as described by the vendor might restrict concurrent use of some AI tools.
The clinical environment also affects radiologists' reading time. Some factors are hardly surprising, such as the time of day, the experience of the radiologist and the number of interruptions À others more puzzling, such as the size and luminescence of the screen, the use of key images in reporting, and the image order in the hanging protocol. (12)(13)(14)(15)(16)(17)(18) Given this complexity, it does not seem unreasonable to assume that the reading times when using AI can be differently affected in a clinical environment and a research environment. This seems supported by studies which find that CAD findings affect the reading times of individual radiologists differently. (3,9) The primary aim of this feasibility study was to investigate if radiologists reading times, when using multiple AI tools concurrently, was non-inferior to reading times without AI tool, in non-contrast low-dose chest CT exams. Secondary aims were to assess if the AI tool lead to additional findings or affected the readers' diagnostic confidence.

MATERIALS AND METHODS
This was a prospective interventional feasibility study with commercially available and regulatory approved software. Ethical approval and need for patient consent were waved due to the nature of the study. The manuscript was prepared using the SQUIRE-2 statement. (19)

AI Tool
The AI tool (AI Chest Companion, Siemens Healthineers, Erlangen, Germany) for the automatic analysis of CT studies of the thorax without contrast was installed at our institution. Non-contrast CT studies of the thorax were automatically and always forwarded to a local DICOM node, which deidentified scans and forwarded them to the cloud-based AI tool. Cases were then assessed by the AI tool and results reidentified and stored in our institutional picture archiving and communications system (PACS) alongside original scan series (Fig 1). Results of the AI tool were not reviewed by investigators or radiographers prior to display to the radiologists in the PACS viewer. The AI tool performed 9 tasks and provided the radiologists with the results and measurements outlined in Table 1.
The AI tool generated an overview image which displayed the assessed organs (lung, aorta, heart, and spine) color coded based on presence of findings. Five additional series were always sent to PACS per case: 1) single overview image 2) a single coronal curved planar image of the straightened vertebral column with height and bone marrow density overlay 3) a transverse lung series with areas below -950 HU, lung lobes, and nodules marked 4) cross sectional images of the aorta at nine measurement locations 5) a transverse series of the heart with coronary calcification. An additional series was only sent when nodules were found in the lung and consisted of a 3d radial range at eight angles with the detected nodule marked. (Fig 2)

Pre-Study Interview and Baseline Assessment
Prior to the initiation of the study NA performed structured interviews with two specialists in thoracic radiology and one radiology resident. The purpose of this interview was to understand radiologists' expectations and assumptions of the AI tool and its presumed influence on their workflow. Results of the pre study interviews resulted in selection of quantitative and qualitative outcome measures for this study.

Radiological Case Observations
We observed readings of 90 consecutive low dose CT scans of the thorax without intravenous contrast media at the department of radiology at Herlev and Gentofte Hospital in Denmark. The cases were chosen between April and May 2020 and were based on the availability on days on which data collection for the study was possible. Cases were not prescreened and no cases were excluded.

Radiologists
We observed two radiologists (a thoracic radiology consultant with 13 years' experience and a second-year radiology resident [PGY-2] doing rotation in thoracic radiology) during their routine clinical case reading. Radiologists read cases, including prior exams, in their normal working environment at their own discretion with use of any software they liked. Radiologists were free to view the results of the AI tool before, concurrent with or after viewing conventional CT reconstructions.

Baseline
For a baseline assessment we recorded the reading time (time from opening the case to submitting the report) of radiologists without deployment of the AI tool in 25 unique cases each. Figure 1. Overview of AI tool integration into radiologists' workflow. Note À Scan series were sent to a local system, where patient identifying information was removed (pseudonymization) before being processed by the AI tool. The resulting secondary DICOM capture images generated by the AI tool were re-identified and then presented to the reading radiologist. After the AI tool had been installed, radiologists had time to familiarize themselves with the AI tool. At their own discretion they indicated that they felt proficient using the additional series. Then reading time of the radiologist was measured in 20 unique cases each.

Data Collection
The reading time for each radiologist was measured at baseline and post intervention using a stopwatch by a single observer (NA). The timer was stopped when the radiologist was interrupted by other tasks such as answering questions from colleagues or telephone calls and resumed when reading continued. Immediately after each case the observer posed five questions to the radiologist. ( Radiologists were asked to elaborate on the categorical answers.
Reader performance was not objectively assessed as part of this study.

Statistical Analysis and Power Calculation
We previously measured reading times for non-contrast chest CT at 7.03 min (SD 1.8 min) for radiology consultants and 10.68 min (SD 3.56 min) for radiology residents at our institution. Sample size was calculated with the intent of testing for non-inferiority at a power of 0.9. The clinically relevant limit of non-inferiority for the consultant radiologist was decided to be 100 seconds and 200 seconds for the resident in this feasibility study. This resulted in a required sample size of at least 20 cases in each subgroup for each reader.
Reading times with and without the AI tool were compared using Welch t-test per reader, with p values <0.05 indicating a significant difference. Statistical calculations were done using R (version 4.0.1)

Quantitative Outcomes
The consultant radiologists stated in 5/20 cases that additional findings were made by the AI tool, while the resident noted no additional findings. The AI tool provided a better case overview in 18/20 cases for the resident but only 2/20 cases for the consultant. The perceived reading time was not affected by use of the AI tool in 19/20 cases for the consultant but only unaffected in 10/20 cases for the resident ( Table 2 and Fig 4).

Qualitative Outcomes
The comments indicated that the AI tool found two cases of aortic ectasia, one case of coronary calcification and two vertebral compression fractures, which were added to the conclusion, and considered a new actionable finding by the consultant radiologist.
There were comments in 21 cases (resident 17 / consultant 4) where the radiologists indicated that the AI tool increased the conceived diagnostic confidence or gave a better overview. In 15 comments (resident 14 / consultant 1) it was indicated that the tool was used after the reading to confirm that there were no additional findings (i.e. the AI tool was used for second reading), in five cases (resident 3 / consultant 2) the AI tool was used to give an overview before reading the case, nine comments (resident 6 / consultant 3) indicated that the color-coded overview image was helpful, six comments (resident 4 / consultant 2) indicated that the 3D pulmonary model was helpful, and 4 other comments.
The perceived increase in reading time was attributed to problems with false positive findings in two cases (resident), Figure 3. Impact of AI tool for low dose chest CT scans on radiologists reading time. Note À Reading time for each case with and without the use of the AI tool. Dots represent reading times for individual cases, with upper and lower hinges corresponding to 25%, and 75% quantile ranges. The middle line represents the median reading time. too many series in one case (consultant), PACS problems not attributable to the AI tool in one case (resident), and other reasons in two cases (resident).

DISCUSSION
In this prospective feasibility study we assessed the influence of a PACS integrated AI tool on the workflow of two radiologists. Reading times for each radiologist were not significantly affected by the AI tool. The 95% confidence interval for the difference in mean reading time ranged between +163s to -28s for the radiology resident and +84s to -57s for the radiology consultant. These ranges were within the non-inferiority limit defined a priori for this feasibility study. There were additional findings due to the use of the AI tool in 12.5% (5/40) of cases. The AI tool provided a better overview in only 20% (4/ 20) of cases for the radiology consultant. In contrast, the impact on the radiology resident was more striking: a better overview or perceived increased confidence was noted in 90% (18/20) of cases. Comments suggested that this was primarily due to the presentation of an overview image.
We compared results of our study against studies which we identified using the search string "radiology reading time" on MedLine on the 26.11.2020. We identified a study by Brown et al, which also investigated multiple AI tools on CT scans of the chest, and found a reduction in reading time (20). We found no reduction in reading time and even a noticeable, but non-significant, increase in reading time for the radiology resident. Results from the study by Brown et al. might be fundamentally different from our results. They used a paired design, wherein each case was read twice by the same radiologists, a task which involved the systematic measurement of the aortic diameter in all cases. Our radiologists only measure this diameter when they suspect aortic ectasia.
Radiologists commented in multiple cases on a perceived increase in reading time due to false positive findings. This is in line with findings by Silva et., who concluded that CAD systems for nodule detection complemented radiologist reading but require visual confirmation for the reduction of false positives (21) Reading times for both the radiology resident and the consultant did not differ at baseline 366s vs 370s. This was substantially faster than median 731s reading time for CT scans of the chest without contrast reported by Forsberg et al. (22) This difference is probably due to our inclusion of low-dose protocols only, while Forsberg et al. might have included reading times for HRCT scans.
Overall, results from this feasibility study put forward the theory that PACS based integration of AI tools for the assisted reading of chest CT images are feasible without high reading time impact, while providing diagnostic benefits to the radiologist.
A strength of the study was the assessment of real-world performance, with no cases excluded. The time measurement by an independent observer allowed the radiologist to maintain their focus on their work and allowed a direct interview of the radiologist after each case. This allowed us to compare reading time, impact of the AI tool on the radiologists reading tasks, and qualitative comments by the radiologists for each case.
This feasibility study had several limitations. First, the workflow impact of the AI tool was only measured on two radiologists with varying experience. We cannot rule out that the workflow impact might be fundamentally different for other radiologists or affect radiologists differently. This reader dependent effect on reading time as modified by the AI is currently not understood, and might reduce inferences which can be drawn from these study types. However, reading times found in this study corresponded to previously estimated reading times at our institution. Future studies should try to include additional radiologists. It is also likely, that other factors like PACS system familiarity, seniority, and experience with Computer Assisted Detection systems affects reading time and should be individually addressed in future, larger studies. Second, the number of cases assessed was low, with only 45 cases per reader. We previously assessed reading times at our institution for radiology residents and radiology consultants and discerned that a reading time increase of more than 3.5 minutes and 2 minutes respectively were absolute inacceptable performance decreases preventing any further investigation of the AI tool. Mean and variation in reading times for both radiologists fell within this predefined range, but the confidence intervals of this feasibility study are too wide to rule out a negative impact of the algorithm on reading time. In order to investigate the impact of this AI tool with higher precision a larger sample of cases would be necessary. Third, the consultant radiologist reading the cases was also interviewed prior to the study. His personal preference regarding use of AI tools might have affected the chosen outcome measures. Finally, we only evaluated a single AI tool.
The tool was chosen based on availability at our institution, its ability to send results directly to the PACS without confirmation, and the ability of this AI tool to assess multiple disease entities simultaneously. While results are not necessarily generalizable to other AI tools, they do emphasize that a PACS based integration of AI tools is feasible potentially without negative impact on reading time, while still maintaining positive results such as additional findings, and increased diagnostic confidence. The investigated AI tool can also be implemented such that all findings must be preapproved before sending findings to the PACS system, but this option was not investigated.
In conclusion, in this prospective interventional feasibility study we investigated the workflow impact of an AI tool, which assisted radiologists in the reading of low-dose noncontrast CT scans of the chest. The PACS based integration of the AI tool did not significantly increase reading time for the radiology resident or the thoracic radiology consultant with additional actionable findings made as well as a perceived better overview for the radiology resident. We recommend outcomes to be verified in a full-scale study.