“Binary” and “Non-Binary” Detection Tasks: Are Current Performance Measures Optimal?
Rationale and Objectives
We have observed that a very large fraction of responses for several detection tasks during the performance of observer studies are in the extreme ranges of lower than 11% or higher than 89% regardless of the actual presence or absence of the abnormality in question or its subjectively rated “subtleness.” This observation raises questions regarding the validity and appropriateness of using multicategory rating scales for such detection tasks. Monte Carlo simulation of binary and multicategory ratings for these tasks demonstrate that the use of the former (binary) often results in a less biased and more precise summary index and hence may lead to a higher statistical power for determining differences between modalities.
Key Words: Observer performance, ROC, binary decisions
To access this article, please choose from the options below
Supported in part by Grants EB001694, EB002106, and EB003503 (to the University of Pittsburgh) from the National Institute for Biomedical Imaging and Bioengineering (NIBIB), National Institute of Health.
PII: S1076-6332(07)00179-1
doi:10.1016/j.acra.2007.03.014
© 2007 AUR. Published by Elsevier Inc. All rights reserved.
