Multiparametric Quantitative Imaging Biomarkers for Phenotype Classification: A Framework for Development and Validation

Published:October 04, 2022DOI:
      This manuscript is the third in a five-part series related to statistical assessment methodology for technical performance of multi-parametric quantitative imaging biomarkers (mp-QIBs). We outline approaches and statistical methodologies for developing and evaluating a phenotype classification model from a set of multiparametric QIBs. We then describe validation studies of the classifier for precision, diagnostic accuracy, and interchangeability with a comparator classifier. We follow with an end-to-end real-world example of development and validation of a classifier for atherosclerotic plaque phenotypes. We consider diagnostic accuracy and interchangeability to be clinically meaningful claims for a phenotype classification model informed by mp-QIB inputs, aiming to provide tools to demonstrate agreement between imaging-derived characteristics and clinically established phenotypes. Understanding that we are working in an evolving field, we close our manuscript with an acknowledgement of existing challenges and a discussion of where additional work is needed. In particular, we discuss the challenges involved with technical performance and analytical validation of mp-QIBs. We intend for this manuscript to further advance the robust and promising science of multiparametric biomarker development.

      Key Words

      To read this article in full you will need to make a payment

      Purchase one-time access:

      Academic & Personal: 24 hour online accessCorporate R&D Professionals: 24 hour online access
      One-time access price info
      • For academic or personal research use, select 'Academic and Personal'
      • For corporate R&D use, select 'Corporate R&D Professionals'


      Subscribe to Academic Radiology
      Already a print subscriber? Claim online access
      Already an online subscriber? Sign in
      Institutional Access: Sign in to ScienceDirect


        • Sullivan DC
        • Obuchowski NA
        • Kessler LG
        • et al.
        Metrology standards for quantitative imaging biomarkers.
        Radiology. 2015; 277: 813-825
        • Kessler LG
        • Barnhart HX
        • Buckler AJ
        • et al.
        The emerging science of quantitative imaging biomarkers terminology and definitions for scientific studies and regulatory submissions.
        Stat Methods Med Res. 2015; 24: 9-26
        • Raunig DL
        • McShane LM
        • Pennello G
        • et al.
        Quantitative imaging biomarkers: a review of statistical methods for technical performance assessment.
        Stat Methods Med Res. 2015; 24: 27-67
        • Obuchowski NA
        • Reeves AP
        • Huang EP
        • et al.
        Quantitative imaging biomarkers: a review of statistical methods for computer algorithm comparisons.
        Stat Methods Med Res. 2015; 24: 68-106
        • Obuchowski NA
        • Barnhart HX
        • Buckler AJ
        • et al.
        Statistical issues in the comparison of quantitative imaging biomarker algorithms using pulmonary nodule volume as an example.
        Stat Methods Med Res. 2015; 24: 107-140
        • Huang EP
        • Wang XF
        • Choudhury KR
        • et al.
        Meta-analysis of the technical performance of an imaging procedure: guidelines and statistical methodology.
        Stat Methods Med Res. 2015; 24: 141-174
        • Obuchowski N
        • Huang E
        • deSouza N
        • et al.
        A framework for evaluating the technical performance of multiparameter quantitative imaging biomarkers (mp-QIBs).
        Academic Radiology. 2022;
        • Raunig D
        • Delfino JG
        • Pennello G
        • et al.
        Multiparametric quantitative imaging biomarker as a multivariate descriptor of health.
        Acad Radiol. 2022;
        • Huang E
        • Pennello G
        • deSouza N
        • et al.
        A roadmap for developing and evaluating quantitative imaging biomarker-based models for risk prediction.
        Acad Radiol. 2022;
        • Wang X
        • Pennello G
        • deSouza N
        • et al.
        Multiparametric data-driven imaging markers: guidelines for development, application and reporting of model outputs in radiomics.
        Acad Radiol. 2022;
        • Hoehndorf R
        • Schofield PN
        • Gkoutos GV
        Analysis of the human diseasome using phenotype similarity between common, genetic, and infectious diseases.
        Sci Rep. 2015; 5: 10888
        • Swets JA
        Measuring the accuracy of diagnostic systems.
        Science. 1988; 240: 1285-1293
        • Amarenco P
        • Bogousslavsky J
        • Caplan LR
        • et al.
        Classification of stroke subtypes.
        Cerebrovasc Dis. 2009; 27: 493-501
        • Kinner S
        • Reeder S
        • Yokoo T
        Quantitative imaging biomarkers of nafld.
        Dig Dis Sci. 2016; 61: 1337-1347
        • Cascianelli S
        • Molineris I
        • Isella C
        • et al.
        Machine learning for RNA sequencing-based intrinsic subtyping of breast cancer.
        Sci Rep. 2020; 10: 14071
        • Goodman ZD
        Grading and staging systems for inflammation and fibrosis in chronic liver diseases.
        J Hepatol. 2007; 47: 598-607
        • Waxman AG
        • Chelmow D
        • Darragh TM
        • et al.
        Revised terminology for cervical histopathology and its implications for management of high-grade squamous intraepithelial lesions of the cervix.
        Obstet Gynecol. 2012; 120: 1465-1471
        • Armato 3rd, SG
        • Huisman H
        • Drukker K
        • et al.
        PROSTATEx challenges for computerized classification of prostate lesions from multiparametric magnetic resonance images.
        J Med Imaging (Bellingham). 2018; 5044501
        • Calfee CS
        • Delucchi K
        • Parsons PE
        • et al.
        Subphenotypes in acute respiratory distress syndrome: latent class analysis of data from two randomised controlled trials.
        Lancet Respir Med. 2014; 2: 611-620
        • Sakr L
        • Small D
        • Kasymjanova G
        • et al.
        Phenotypic heterogeneity of potentially curable non-small-cell lung cancer: cohort study with cluster analysis.
        J Thorac Oncol. 2015; 10: 754-761
        • Wu J
        • Cui Y
        • Sun X
        • et al.
        Unsupervised clustering of quantitative image phenotypes reveals breast cancer subtypes with distinct prognoses and molecular pathways.
        Clin Cancer Res. 2017; 23: 3334-3342
        • Pepe MS
        The Statistical Evaluation of Medical Tests for Classification and Prediction. Oxford University Press, New York, NY2003 (ISBN-13: 978-0198565826)
        • Zhou XH
        • Obuchowski NA
        • McClish DK
        Statistical Methods in Diagnostic Medicine. 2nd Edition. John Wiley & Sons, Inc, Hoboken, New Jersey2010 (ISBN:9780470183144)
        • Altman DG
        • Bland JM
        Diagnostic tests. 1: sensitivity and specificity.
        BMJ. 1994; 308: 1552
        • Altman DG
        • Bland JM
        Diagnostic tests 2: predictive values.
        BMJ. 1994; 309: 102
        • Altman DG
        • Bland JM
        Diagnostic tests 3: receiver operating characteristic plots.
        BMJ. 1994; 309: 188
        • Deeks JJ
        • Altman DG
        Diagnostic tests 4: likelihood ratios.
        BMJ. 2004; 329: 168-169
        • Simel DL
        • Samsa GP
        • Matchar DB
        Likelihood ratios with confidence: Sample size estimation for diagnostic test studies.
        J Clin Epidemiol. 1991; 44: 763-770
        • Zou KH
        • Liu A
        • Bandos AI
        • Ohno-Machado L
        • Rockette HE
        Statistical Evaluation of Diagnostic Performance: Topics in ROC Analysis. Chapman & Hall, Boca Raton, FL2012 (ISBN 9781439812228)
        • Krzanowski WJ
        • Hand DJ
        ROC Curves for Continuous Data. Chapman & Hall, Boca Raton, FL2009 (ISBN 9781439800218)
        • Hastie T
        • Tibshirani R.
        • Friedman J
        The elements of statistical learning: data mining, inference, and prediction. Springer Science & Business Media, New York, NY2009 (ISBN-13: 978-0387848570)
        • Anderson TW.
        An Introduction to Multivariate Statistical Analysis. 3rd edition. John Wiley & Sons, Hoboken, NJ2003 (ISBN-13:978-0471360919)
        • Barr RG
        • Ferraioli G
        • Palmeri ML
        • et al.
        Elastography Assessment of liver fibrosis: society of radiologists in ultrasound consensus conference statement.
        Radiology. 2015; 276: 845-861
        • Bachtiar V
        • Kelly MD
        • Wilman HR
        • et al.
        Repeatability and reproducibility of multiparametric magnetic resonance imaging of the liver.
        PLoS One. 2019; 14e0214921
      1. Standardization IOf. Guidance for the use of repeatability, reproducibility and trueness estimates in measurement uncertainty estimation. ISO Standard No 21748:2017(E). Geneva, Switzerland; 2017.

        • Kerr KF
        • McClelland RL
        • Brown ER
        • et al.
        Evaluating the incremental value of new biomarkers with integrated discrimination improvement.
        Am J Epidemiol. 2011; 174: 364-374
        • Kerr KF
        • Bansal A
        • Pepe MS
        Further insight into the incremental value of new markers: the interpretation of performance measures and the importance of clinical context.
        Am J Epidemiol. 2012; 176: 482-487
        • Pepe MS
        • Kerr KF
        • Longton G
        • et al.
        Testing for improvement in prediction model performance.
        Stat Med. 2013; 32: 1467-1482
        • Vickers AJ
        • Cronin AM
        • Begg CB
        One statistical test is sufficient for assessing new predictive markers.
        BMC Med Res Methodol. 2011; 11: 13
        • Pencina MJ
        • D'Agostino Sr., RB
        • D'Agostino Jr., RB
        • et al.
        Evaluating the added predictive ability of a new marker: from area under the ROC curve to reclassification and beyond.
        Stat Med. 2008; 27: 157-172
        • Ambroise C
        • McLachlan GJ
        Selection bias in gene extraction on the basis of microarray gene-expression data.
        Proc Natl Acad Sci U S A. 2002; 99: 6562-6566
        • Simon R
        • Radmacher MD
        • Dobbin K
        • et al.
        Pitfalls in the use of DNA microarray data for diagnostic and prognostic classification.
        J Natl Cancer Inst. 2003; 95: 14-18
        • Efron B
        • Tibshirani R
        Improvements on cross-validation: the .632+ Bootstrap Method.
        J Am Stat Assoc. 1997; 92: 548-560
        • Harrell FE
        • Lee KL
        • Mark DB
        Multivariable prognostic models: issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors.
        Stat Med. 1996; 15: 361-387
        • Altman DG
        • Royston P
        What do we mean by validating a prognostic model?.
        Stat Med. 2000; 19: 453-473
      2. CLSI Harmonized Terminology Database. Available at:

        • Bossuyt PM
        • Reitsma JB
        • Bruns DE
        • et al.
        STARD 2015: an updated list of essential items for reporting diagnostic accuracy studies.
        Radiology. 2015; 277: 826-832
        • Light R
        • Margolin B
        An analysis of variance for categorical data.
        J Am Stat Assoc. 1971; 66: 534-544
        • MittlbÖCk M
        • Schemper M
        Explained variation for logistic regression.
        Stat Med. 1996; 15: 1987-1997
        • Haberman SJ
        Analysis of Dispersion of Multinomial Responses.
        J Am Stat Assoc. 1982; 77: 568-580
        • CGA-EiCCAP Zweig MH
        Receiver-operating characteristic (ROC) plots: a fundamental evaluation tool in clinical medicine.
        Clin Chem. 1993; 39: 561-577
        • Obuchowski NA
        ROC analysis.
        AJR Am J Roentgenol. 2005; 184: 364-372
        • Ferraioli G
        • Tinelli C
        • Zicchetti M
        • et al.
        Reproducibility of real-time shear wave elastography in the evaluation of liver elasticity.
        Eur J Radiol. 2012; 81: 3102-3106
        • Khan AN
        • Al-Jahdali H
        • Al-Ghanem S
        • et al.
        Reading chest radiographs in the critically ill (Part II): radiography of lung pathologies common in the ICU patient.
        Ann Thorac Med. 2009; 4: 149-157
        • Agresti A.
        Categorical Data Analysis. 3rd Edition. John Wiley and Sons, Hoboken, NJ2013 (ISBN-13: 978-0470463635)
        • Agresti A
        Applying R2-type measures to ordered categorical data.
        Technometrics. 1986; 28: 133-138
        • Goodman LA
        • Kruskal WH
        Measures of Association for Cross Classifications.
        J Am Stat Assoc. 1954; 49: 732-764
        • Obuchowski NA
        Estimating and comparing diagnostic tests' accuracy when the gold standard is not binary.
        Acad Radiol. 2005; 12: 1198-1204
        • Stuart A
        A test for homogeneity of the marginal distributions in a two-way classification.
        Biometrika. 1955; 42: 412-416
        • Bowker AH
        A test for symmetry in contingency tables.
        J Am Stat Assoc. 1948; 43: 572-574
        • Agresti A
        Modelling patterns of agreement and disagreement.
        Stat Methods Med Res. 1992; 1: 201-218
        • Obuchowski NA
        • Subhas N
        • Schoenhagen P
        Testing for interchangeability of imaging tests.
        Acad Radiol. 2014; 21: 1483-1489
        • Hadgu A
        • Dendukuri N
        • Wang L
        Evaluation of screening tests for detecting Chlamydia trachomatis: bias associated with the patient-infected-status algorithm.
        Epidemiology. 2012; 23: 72-82
      3. USFDA. Guidance for industry: statistical approaches to establishing bioequivalence. US Food and Drug Administration; 2001. Available at: Accessed August 4, 2022.

        • Barnhart HX
        • Kosinski AS
        • Haber MJ
        Assessing individual agreement.
        J Biopharm Stat. 2007; 17: 697-719
        • Obuchowski NA
        Can electronic medical images replace hard-copy film? Defining and testing the equivalence of diagnostic tests.
        Stat Med. 2001; 20: 2845-2863
      4. World Health Organization (WHO). Cardiovascular diseases (CVDs) Fact Sheet. Available at: Accessed April 23, 2020.

        • Lyngbakken MN
        • Myhre PL
        • Rosjo H
        • et al.
        Novel biomarkers of cardiovascular disease: applications in clinical practice.
        Crit Rev Clin Lab Sci. 2019; 56: 33-60
        • Hafiane A
        Vulnerable plaque, characteristics, detection, and potential therapies.
        J Cardiovasc Dev Dis. 2019; 6: 26
        • Stary HC
        Natural history and histological classification of atherosclerotic lesions: an update.
        Arterioscler Thromb Vasc Biol. 2000; 20: 1177-1178
        • Virmani R
        • Kolodgie FD
        • Burke AP
        • et al.
        Lessons from sudden coronary death: a comprehensive morphological classification scheme for atherosclerotic lesions.
        Arterioscler Thromb Vasc Biol. 2000; 20: 1262-1275
        • Virmani R
        • Burke AP
        • Farb A
        • et al.
        Pathology of the vulnerable plaque.
        JACC. 2006; 47: C13-C18
        • Neglia D
        • Rovai D
        • Caselli C
        • et al.
        Detection of significant coronary artery disease by noninvasive anatomical and functional imaging.
        Circ Cardiovasc Imaging. 2015; 8
        • Williams MC
        • Moss AJ
        • Dweck M
        • et al.
        Coronary artery plaque characteristics associated with adverse outcomes in the SCOT-HEART Study.
        J Am Coll Cardiol. 2019; 73: 291-301
      5. ISCHEMIA: invasive strategy no better than meds for CV Events. 2020. Available at: Accessed August 4, 2022.

        • Buckler AJ
        • Karlöf E
        • Lengquist M
        • et al.
        Virtual transcriptomics: noninvasive phenotyping of atherosclerosis by decoding plaque biology from computed tomography angiography imaging.
        Arterioscler Thromb Vasc Biol. 2021;
        • Sheahan M
        • Ma X
        • Paik D
        • et al.
        Atherosclerotic plaque tissue: noninvasive quantitative assessment of characteristics with software-aided measurements from conventional CT angiography.
        Radiology. 2018; 286: 622-631
      6. QIBA Computed Tomography Angiography Biomarkers Committee. QIBA profile: atherosclerosis biomarkers by computed tomography angiography (CTA)-2020 profile stage: consensus. Available at: Accessed August 4, 2022.

        • Fleiss JL
        • Levin B
        • Paik MC
        Statistical methods for rates and proportions. 3rd Edition. John Wiley & Sons, Inc., Hoboken, New Jersey2003 (ISBN 0-471-52629-0)
        • Bankier AA
        • Levine D
        • Halpern EF
        • et al.
        Consensus interpretation in imaging research: is there a better way?.
        Radiology. 2010; 257: 14-17
        • EP29-A
        Expression of Measurement Uncertainty in Laboratory Medicine.
        (CAGCd) Clinical and Laboratory Standards Institute, Wayne, PA2012
        • Rudin C
        Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead.
        Nat Mach Intell. 2019; 1: 206-215
        • Liao CC
        • Chen YF
        • Xiao F
        brain midline shift measurement and its automation: a review of techniques and algorithms.
        Int J Biomed Imaging. 2018; 20184303161
        • Mello V SR
        Cervical Intraepithelial Neoplasia.
        StatPearls Publishing, Treasure Island, FL2022
        • Jackson CH SL
        • Thompson SG
        • Duffy SW
        • et al.
        Multistate markov models for disease progression with classification error.
        J Royal Statistical Society Series D (The Statistician). 2003; 52: 193-209
        • Bossuyt PM
        • Reitsma JB
        • Linnet K
        • et al.
        Beyond diagnostic accuracy: the clinical utility of diagnostic tests.
        Clin Chem. 2012; 58: 1636-1643
        • Fryback DG
        • Thornbury JR
        The efficacy of diagnostic imaging.
        Med Decis Making. 1991; 11: 88-94
        • Bossuyt PMM
        • Lijmer JG
        • Mol BWJ
        Randomised comparisons of medical tests: sometimes invalid, not always efficient.
        The Lancet. 2000; 356: 1844-1847
        • Simon R
        Clinical trial designs for evaluating the medical utility of prognostic and predictive biomarkers in oncology.
        Per Med. 2010; 7: 33-47
        • Marsh TL
        • Janes H
        • Pepe MS
        Statistical inference for net benefit measures in biomarker validation studies.
        Biometrics. 2019; 76: 843-852