Facilitating surveillance of pulmonary invasive mold diseases in patients with haematological malignancies by screening computed tomography reports using natural language processing

PLoS One. 2014 Sep 24;9(9):e107797. doi: 10.1371/journal.pone.0107797. eCollection 2014.

Abstract

Purpose: Prospective surveillance of invasive mold diseases (IMDs) in haematology patients should be standard of care but is hampered by the absence of a reliable laboratory prompt and the difficulty of manual surveillance. We used a high throughput technology, natural language processing (NLP), to develop a classifier based on machine learning techniques to screen computed tomography (CT) reports supportive for IMDs.

Patients and methods: We conducted a retrospective case-control study of CT reports from the clinical encounter and up to 12-weeks after, from a random subset of 79 of 270 case patients with 33 probable/proven IMDs by international definitions, and 68 of 257 uninfected-control patients identified from 3 tertiary haematology centres. The classifier was trained and tested on a reference standard of 449 physician annotated reports including a development subset (n = 366), from a total of 1880 reports, using 10-fold cross validation, comparing binary and probabilistic predictions to the reference standard to generate sensitivity, specificity and area under the receiver-operating-curve (ROC).

Results: For the development subset, sensitivity/specificity was 91% (95%CI 86% to 94%)/79% (95%CI 71% to 84%) and ROC area was 0.92 (95%CI 89% to 94%). Of 25 (5.6%) missed notifications, only 4 (0.9%) reports were regarded as clinically significant.

Conclusion: CT reports are a readily available and timely resource that may be exploited by NLP to facilitate continuous prospective IMD surveillance with translational benefits beyond surveillance alone.

Publication types

  • Multicenter Study
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Adolescent
  • Adult
  • Aged
  • Aged, 80 and over
  • Case-Control Studies
  • Female
  • Hematologic Neoplasms / complications*
  • Humans
  • Lung Diseases / complications
  • Lung Diseases / diagnosis*
  • Lung Diseases / microbiology
  • Male
  • Middle Aged
  • Mycoses / complications
  • Mycoses / diagnosis*
  • Mycoses / microbiology
  • Natural Language Processing*
  • Population Surveillance
  • ROC Curve
  • Retrospective Studies
  • Tomography, X-Ray Computed / methods*
  • Young Adult

Grants and funding

This study was supported by the National Health and Medical Research Council (NHMRC) post-graduate medical scholarship to MAR, and AC is funded by a NHMRC Career Development Fellowship. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.