Hybrid Deep Learning for Medication-Related Information Extraction From Clinical Texts in French: MedExt Algorithm Development Study

Jordan Jouffroy; Sarah F Feldman; Ivan Lerner; Bastien Rance; Anita Burgun; Antoine Neuraz

doi:10.2196/17934

Hybrid Deep Learning for Medication-Related Information Extraction From Clinical Texts in French: MedExt Algorithm Development Study

JMIR Med Inform. 2021 Mar 16;9(3):e17934. doi: 10.2196/17934.

Authors

Jordan Jouffroy^{1

2}, Sarah F Feldman^{1

2}, Ivan Lerner^{1

2}, Bastien Rance^{2

3}, Anita Burgun^{1

2}, Antoine Neuraz^{1

2}

Affiliations

¹ Department of Biomedical Informatics, Necker-Enfants malades Hospital, Assistance Publique-Hôpitaux de Paris, Paris, France.
² UMRS 1138 team 22, Institut National de la Santé et de la Recherche Médicale, Université de Paris, Paris, France.
³ Department of Biomedical Informatics, Georges Pompidou European Hospital, Assistance Publique-Hôpitaux de Paris, Paris, France.

PMID: 33724196
PMCID: PMC8077811
DOI: 10.2196/17934

Abstract

Background: Information related to patient medication is crucial for health care; however, up to 80% of the information resides solely in unstructured text. Manual extraction is difficult and time-consuming, and there is not a lot of research on natural language processing extracting medical information from unstructured text from French corpora.

Objective: We aimed to develop a system to extract medication-related information from clinical text written in French.

Methods: We developed a hybrid system combining an expert rule-based system, contextual word embedding (embedding for language model) trained on clinical notes, and a deep recurrent neural network (bidirectional long short term memory-conditional random field). The task consisted of extracting drug mentions and their related information (eg, dosage, frequency, duration, route, condition). We manually annotated 320 clinical notes from a French clinical data warehouse to train and evaluate the model. We compared the performance of our approach to those of standard approaches: rule-based or machine learning only and classic word embeddings. We evaluated the models using token-level recall, precision, and F-measure.

Results: The overall F-measure was 89.9% (precision 90.8; recall: 89.2) when combining expert rules and contextualized embeddings, compared to 88.1% (precision 89.5; recall 87.2) without expert rules or contextualized embeddings. The F-measures for each category were 95.3% for medication name, 64.4% for drug class mentions, 95.3% for dosage, 92.2% for frequency, 78.8% for duration, and 62.2% for condition of the intake.

Conclusions: Associating expert rules, deep contextualized embedding, and deep neural networks improved medication information extraction. Our results revealed a synergy when associating expert knowledge and latent knowledge.

Keywords: deep learning; electronic health records; hybrid system; medication information; natural language processing; rule-based system, recurrent neural network.

©Jordan Jouffroy, Sarah F Feldman, Ivan Lerner, Bastien Rance, Anita Burgun, Antoine Neuraz. Originally published in JMIR Medical Informatics (http://medinform.jmir.org), 16.03.2021.