Special Review| Volume 27, ISSUE 1, P6-12, January 2020

Essential Elements of Natural Language Processing: What the Radiologist Should Know

Published:September 17, 2019DOI:
      Natural language is ubiquitous in the workflow of medical imaging. Radiologists create and consume free text in their daily work, some of which can be amenable to enhancements through automatic processing. Recent advancements in deep learning and “artificial intelligence” have had a significant positive impact on natural language processing (NLP). This article discusses the history of how researchers have extracted data and encoded natural language information for analytical processing, starting from NLP's humble origins in hand-curated, linguistic rules. The evolution of medical NLP including vectorization, word embedding, classification, as well as its use in automated speech recognition, are also explored. Finally, the article will discuss the role of machine learning and neural networks in the context of significant, if incremental, improvements in NLP.

      Key Words

      To read this article in full you will need to make a payment

      Purchase one-time access:

      Academic & Personal: 24 hour online accessCorporate R&D Professionals: 24 hour online access
      One-time access price info
      • For academic or personal research use, select 'Academic and Personal'
      • For corporate R&D use, select 'Corporate R&D Professionals'


      Subscribe to Academic Radiology
      Already a print subscriber? Claim online access
      Already an online subscriber? Sign in
      Institutional Access: Sign in to ScienceDirect


        • Oh SC
        • Cook TS
        • Kahn CE
        PORTER: a prototype system for patient-oriented radiology reporting.
        J Digit Imaging. 2016; 29: 450-454
        • Martin-Carreras T
        • Kahn CE
        Coverage and readability of information resources to help patients understand radiology reports.
        J Am Coll Radiol JACR. 2018; 15: 1681-1686
        • Shinagare AB
        • Lacson R
        • Boland GW
        • et al.
        Radiologist preferences, agreement, and variability in phrases used to convey diagnostic certainty in radiology reports.
        J Am Coll Radiol JACR. 2018; 16: 458-464
        • Almeida RR
        • Singh AK
        • Mansouri M
        • et al.
        Impact of radiology report wording on care of patients with acute epiploic appendagitis.
        AJR Am J Roentgenol. 2019; : 1-6
        • Boland GW
        • Enzmann DR
        • Duszak R
        Actionable reporting.
        J Am Coll Radiol JACR. 2014; 11: 844-845
        • Boland GW
        • Duszak R
        • Kalra M
        Protocol design and optimization.
        J Am Coll Radiol JACR. 2014; 11: 440-441
        • Enzmann DR
        Radiology's value chain.
        Radiology. 2012; 263: 243-252
        • Lovins JL
        Development of a stemming algorithm.
        Mech Transl Comput Linguist. 1968; 11: 22-31
        • Porter MF
        An algorithm for suffix stripping.
        Program. 1980; 14: 130-137
        • Chen P-H
        • Zafar H
        • Galperin-Aizenberg M
        • et al.
        Integrating natural language processing and machine learning algorithms to categorize oncologic response in radiology reports.
        J Digit Imaging. 2017; 31: 178-184
        • Weng W-H
        • Wagholikar KB
        • McCray AT
        • et al.
        Medical subdomain classification of clinical notes using a machine learning-based natural language processing approach.
        BMC Med Inform Decis Mak. 2017; 17: 155
        • Delavenay É
        An introduction to machine translation.
        Thames and Hudson, London, United Kingdom1960
        • Liu H
        • Christiansen T
        • Baumgartner WA
        • et al.
        BioLemmatizer: a lemmatization tool for morphological processing of biomedical text.
        J Biomed Semant. 2012; 3: 3
        • Salton G
        Automatic text processing: the transformation, analysis, and retrieval of information by computer.
        Addison-Wesley, Reading, Mass1988: 530 (Addison-Wesley series in computer science)
        • Wilbur WJ
        • Sirotkin K
        The automatic identification of stop words.
        J Inf Sci. 1992; 18: 45-55
        • Brown P.F.
        • deSouza P.V.
        • Mercer R.L.
        • et al.
        Class-based n-gram models of natural language.
        Comput Linguist. 1992; 18: 467-479
        • Sparck Jones K
        A Statistical Interpretation Of Term Specificity And Its Application In Retrieval.
        J Doc. 1972; 28: 11-21
        • Brown AD
        • Kachura JR
        Natural language processing of radiology reports in patients with hepatocellular carcinoma to predict radiology resource utilization.
        J Am Coll Radiol. 2019; 16 (S1546144018315539): 840-844
        • Mikolov T
        • Chen K
        • Corrado G
        • et al.
        Efficient estimation of word representations in vector space.
        in: ICLR workshop. 2013
        • Joulin A
        • Grave E
        • Bojanowski P
        • et al.
        Bag of tricks for efficient text classification.
        in: Proc 15th conf eur chapter assoc comput linguist. 2. 2017: 427-431
        • Bojanowski P
        • Grave E
        • Joulin A
        • et al.
        Enriching word vectors with subword information.
        Trans Assoc Comput Linguist. 2017; 5: 135-146
        • Yuan J
        • Zhu H
        • Tahmasebi A
        Classification of pulmonary nodular findings based on characterization of change using radiology reports.
        AMIA Jt Summits Transl Sci Proc. 2019; 2019: 285-294
        • Gale W
        • Church K.W
        • Yarowsky D.
        Estimating upper and lower bounds on the performance of word-sense disambiguation programs.
        in: Proceedings of the 30th ACL. DE, Newark1992: 249-256
        • Florian R
        • Ittycheriah A
        • Jing H
        • et al.
        Named entity recognition through classifier combination.
        in: Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - [Internet], Edmonton, Canada Association for Computational Linguistics, 2003: 168-171 ([cited 2019 Mar 26]Available at:)
        • Tjong EF
        • Sang K
        • De Meulder F
        Introduction to the CoNLL-2003 shared task: language-independent named entity recognition.
        in: Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 [Internet], Edmonton, Canada Association for Computational Linguistics, 2003: 142-147 ([cited March 26, 2019]Available at:)
        • Habibi M
        • Weber L
        • Neves M
        • et al.
        Deep learning with word embeddings improves biomedical named entity recognition.
        Bioinformatics. 2017; 33: i37-i48
        • Sanchez-Cisneros D
        • Gali FA
        UEM-UC3M: an Ontology-based namedentity recognition system for biomedical texts.
        in: Volume 2: seventh international workshop on semantic evaluation (SemEval 2013). 2013: 622-627
        • Langlotz CP
        RadLex: a new method for indexing online educational materials.
        RadioGraphics. 2006; 26: 1595-1597
        • Xu Y
        • Hua J
        • Ni Z
        • et al.
        Anatomical entity recognition with a hierarchical framework augmented by external resources.
        PLoS One. 2014; 9 (Homayouni R, editor)e108396
        • Campos L
        • Pedro V
        • Couto F
        Impact of translation on named-entity recognition in radiology texts.
        Database. 2017; ([cited March 26, 2017 Available at:)
        • Zingmond D
        • Lenert LA
        Monitoring free-text data using medical language processing.
        Comput Biomed Res. 1993; 26: 467-481
        • Zhang Y
        • Gong L
        • Wang Y
        Chinese word sense disambiguation using hownet.
        in: Wang L Chen K Ong YS Advances in Natural Computation [Internet]. Springer Berlin Heidelberg, Berlin, Heidelberg2005: 925-932 ([cited March 26, 2019]Available at:)
        • Savova GK
        • Masanz JJ
        • Ogren PV
        • et al.
        Mayo clinical text analysis and knowledge extraction system (cTAKES): architecture, component evaluation and applications.
        J Am Med Inform Assoc JAMIA. 2010; 17: 507-513
        • Garla V
        • Lo Re V
        • Dorey-Stein Z
        • et al.
        The Yale cTAKES extensions for document classification: architecture and application.
        J Am Med Inform Assoc JAMIA. 2011; 18: 614-620
        • Lakhani P
        • Kim W
        • Langlotz CP
        Automated detection of critical results in radiology reports.
        J Digit Imaging. 2012; 25: 30-36
        • Lakhani P
        • Kim W
        • Langlotz CP
        Automated extraction of critical test values and communications from unstructured radiology reports: an analysis of 9.3 million reports from 1990 to 2011.
        Radiology. 2012; 265: 809-818
        • Yetisgen-Yildiz M
        • Gunn ML
        • Xia F
        • et al.
        A text processing pipeline to extract recommendations from radiology reports.
        J Biomed Inform. 2013; 46: 354-362
        • Dutta S
        • Long WJ
        • Brown DFM
        • et al.
        Automated detection using natural language processing of radiologists recommendations for additional imaging of incidental findings.
        Ann Emerg Med. 2013; 62: 162-169
        • Blei DM
        • Ng AY
        • Jordan MI
        Latent dirichlet allocation.
        J Mach Learn Res. 2003; 3: 993-1022
        • Zech J
        • Pain M
        • Titano J
        • et al.
        Natural language-based machine learning models for the annotation of clinical radiology reports.
        Radiology. 2018; 287: 570-580
        • Cheng LTE
        • Zheng J
        • Savova GK
        • et al.
        Discerning tumor status from unstructured MRI reports–completeness of information in existing reports and utility of automated natural language processing.
        J Digit Imaging. 2010; 23: 119-132
        • Quint LE
        • Quint DJ
        • Myles JD
        Frequency and spectrum of errors in final radiology reports generated with automatic speech recognition technology.
        J Am Coll Radiol JACR. 2008; 5: 1196-1199
        • Ilgner J
        • Düwel P
        • Westhofen M
        Free-text data entry by speech recognition software and its impact on clinical routine.
        Ear Nose Throat J. 2006; 85: 523-527
        • Yang Z
        • Lin H
        • Li Y
        Exploiting the performance of dictionary-based bio-entity name recognition in biomedical literature.
        Comput Biol Chem. 2008; 32: 287-291
        • Lippmann RP
        Review of neural networks for speech recognition.
        Neural Comput. 1989; 1: 1-38
        • Robinson A.J.
        • Almeida L.
        • Boite J.
        • et al.
        A neural network based, speaker independent, large vocabulary, continuous speech recognition system: the WERNICKE project.
        in: Proceedings of Eurospeech 1993 conference. 1994: 1941-1944
        • Blackley SV
        • Huynh J
        • Wang L
        • et al.
        Speech recognition for clinical documentation from 1990 to 2018: a systematic review.
        J Am Med Inform Assoc JAMIA. 2019; 26: 324-338
        • Ichikawa T
        • Kitanosono T
        • Koizumi J
        • et al.
        Radiological reporting that combine continuous speech recognition with error correction by transcriptionists.
        Tokai J Exp Clin Med. 2007; 32: 144-147
        • Vorbeck F
        • Ba-Ssalamah A
        • Kettenbach J
        • et al.
        Report generation using digital speech recognition in radiology.
        Eur Radiol. 2000; 10: 1976-1982