Automated Identification of Patients With Immune-Related Adverse Events From Clinical Notes Using Word Embedding and Machine Learning

JCO Clin Cancer Inform. 2021 May:5:541-549. doi: 10.1200/CCI.20.00109.

Abstract

Purpose: Although immune checkpoint inhibitors (ICIs) have substantially improved survival in patients with advanced malignancies, they are associated with a unique spectrum of side effects termed immune-related adverse events (irAEs). To ensure treatment safety, research efforts are needed to comprehensively detect and understand irAEs. Retrospective analysis of data from electronic health records can provide knowledge to characterize these toxicities. However, such information is not captured in a structured format within the electronic health record and requires manual chart review.

Materials and methods: In this work, we propose a natural language processing pipeline that can automatically annotate clinical notes and determine whether there is evidence that a patient developed an irAE. Seven hundred eighty-one cases were manually reviewed by clinicians and annotated for irAEs at the patient level. A dictionary of irAEs keywords was used to perform text reduction on clinical notes belonging to each patient; only sentences with relevant expressions were kept. Word embeddings were then used to generate vector representations over the reduced text, which served as input for the machine learning classifiers. The output of the models was presence or absence of any irAEs. Additional models were built to classify skin-related toxicities, endocrine toxicities, and colitis.

Results: The model for any irAE achieved an average F1-score = 0.75 and area under the receiver operating characteristic curve = 0.85. This outperformed a basic keyword filtering approach. Although the classifier of any irAEs achieved good accuracy, individual irAE classification still has room for improvement.

Conclusion: We demonstrate that patient-level annotations combined with a machine learning approach using keywords filtering and word embeddings can achieve promising accuracy in classifying irAEs in clinical notes. This model may facilitate annotation and analysis of large irAEs data sets.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Electronic Health Records
  • Humans
  • Machine Learning*
  • Natural Language Processing
  • Neoplasms* / therapy
  • Retrospective Studies