Data Analysis of the Lung Imaging Database Consortium and Image Database Resource Initiative

Published:January 16, 2015DOI:

      Rationale and Objectives

      The Lung Image Database Consortium (LIDC) and Image Database Resource Initiative (IDRI) is the largest publicly available computed tomography (CT) image reference data set of lung nodules. In this article, a comprehensive data analysis of the data set and a uniform data model are presented with the purpose of facilitating potential researchers to have an in-depth understanding to and efficient use of the data set in their lung cancer–related investigations.

      Materials and Methods

      A uniform data model was designed for representation and organization of various types of information contained in different source data files. A software tool was developed for the processing and analysis of the database, which 1) automatically aligns and graphically displays the nodule outlines marked manually by radiologists onto the corresponding CT images; 2) extracts diagnostic nodule characteristics annotated by radiologists; 3) calculates a variety of nodule image features based on the outlines of nodules, including diameter, volume, and degree of roundness, and so forth; 4) integrates all the extracted nodule information into the uniform data model and stores it in a common and easy-to-access data format; and 5) analyzes and summarizes various feature distributions of nodules in several different categories. Using this data processing and analysis tool, all 1018 CT scans from the data set were processed and analyzed for their statistical distribution.


      The information contained in different source data files with different formats was extracted and integrated into a new and uniform data model. Based on the new data model, the statistical distributions of nodules in terms of nodule geometric features and diagnostic characteristics were summarized. In the LIDC/IDRI data set, 2655 nodules ≥3 mm, 5875 nodules <3 mm, and 7411 non-nodules are identified, respectively. Among the 2655 nodules, 1) 775, 488, 481, and 911 were marked by one, two, three, or four radiologists, respectively; 2) most of nodules ≥3 mm (85.7%) have a diameter <10.0 mm with the mean value of 6.72 mm; and 3) 10.87%, 31.4%, 38.8%, 16.4%, and 2.6% of nodules were assessed with a malignancy score of 1, 2, 3, 4, and 5, respectively.


      This study demonstrates the usefulness of the proposed software tool to the potential users for an in-depth understanding of the LIDC/IDRI data set, therefore likely to be beneficial to their future investigations. The analysis results also demonstrate the distribution diversity of nodules characteristics, therefore being useful as a reference resource for assessing the performance of a new and existing nodule detection and/or segmentation schemes.

      Key Words

      To read this article in full you will need to make a payment

      Purchase one-time access:

      Academic & Personal: 24 hour online accessCorporate R&D Professionals: 24 hour online access
      One-time access price info
      • For academic or personal research use, select 'Academic and Personal'
      • For corporate R&D use, select 'Corporate R&D Professionals'


      Subscribe to Academic Radiology
      Already a print subscriber? Claim online access
      Already an online subscriber? Sign in
      Institutional Access: Sign in to ScienceDirect


        • Armato S.G.
        • et al.
        The Lung Image Database Consortium (LIDC) and Image Database Resource Initiative (IDRI): a completed reference database of lung nodules on CT scans.
        Med Phys. 2011; 38: 915-932
        • Clarke L.P.
        • et al.
        National Cancer Institute initiative: lung image database resource for imaging research.
        Acad Radiol. 2001; 8: 447-450
        • McNitt-Gray M.F.
        • et al.
        The Lung Image Database Consortium (LIDC) data collection process for nodule detection and annotation.
        Acad Radiol. 2007; 14: 1464-1474
        • Schilham A.M.R.
        • Van Ginneken B.
        • Loog M.
        A computer-aided diagnosis system for detection of lung nodules in chest radiographs with an evaluation on a public database.
        Med Image Anal. 2006; 10: 247-258
        • Retico A.
        • et al.
        Lung nodule detection in low-dose and thin-slice computed tomography.
        Comput Biol Med. 2008; 38: 525-534
        • Hardie R.C.
        • et al.
        Performance analysis of a new computer aided detection system for identifying lung nodules on chest radiographs.
        Med Image Anal. 2008; 12: 240-258
        • Messay T.
        • Hardie R.C.
        • Rogers S.K.
        A new computationally efficient CAD system for pulmonary nodule detection in CT imagery.
        Med Image Anal. 2010; 14: 390-406
        • Reeves A.P.
        • Kostis W.J.
        Computer-aided diagnosis for lung cancer.
        Radiol Clin North Am. 2000; 38: 497-509
        • Iwano S.
        • et al.
        Computer-aided diagnosis: a shape classification of pulmonary nodules imaged by high-resolution CT.
        Comput Med Imaging Graph. 2005; 29: 565-570
        • Way T.W.
        • Hadjiiski L.M.
        • Sahiner B.
        • et al.
        Computer-aided diagnosis of pulmonary nodules on CT scans: segmentation and classification using 3D active contours.
        Med Phys. 2006; 33: 2323
        • Lin H.
        • Chen Z.
        • Wang W.
        A pulmonary nodule view system for the Lung Image Database Consortium (LIDC).
        Acad Radiol. 2011; 18: 1181-1185
        • Reeves A.P.
        • Biancardi A.M.
        • Apanasovich T.V.
        • et al.
        The Lung Image Database Consortium (LIDC): a comparison of different size metrics for pulmonary nodule measurements.
        Acad Radiol. 2007; 14: 1475-1485
        • Ross J.C.
        • Miller J.V.
        • Turner W.D.
        • et al.
        An analysis of early studies released by the Lung Imaging Database Consortium (LIDC).
        Acad Radiol. 2007; 14: 1382-1388
        • Tan J.
        • Pu J.
        • Zheng B.
        • et al.
        Computerized comprehensive data analysis of Lung Imaging Database Consortium (LIDC).
        Med Phys. 2010; 37: 3802-3808
      1. Available at:; Accessed October 20, 2013.