Automated Electronic Health Record Data Extraction and Curation Using ExtractEHR

JCO Clin Cancer Inform. 2024 Nov:8:e2400100. doi: 10.1200/CCI.24.00100. Epub 2024 Nov 25.

Abstract

Purpose: Although the potential transformative effect of electronic health record (EHR) data on clinical research in adult patient populations has been very extensively discussed, the effect on pediatric oncology research has been limited. Multiple factors contribute to this more limited effect, including the paucity of pediatric cancer cases in commercial EHR-derived cancer data sets and phenotypic case identification challenges in pediatric federated EHR data.

Methods: The ExtractEHR software package was initially developed as a tool to improve clinical trial adverse event reporting but has expanded its use cases to include the development of multisite EHR data sets and the support of cancer cohorts. ExtractEHR enables customized, automated data extraction from the EHR that, when implemented across multiple hospitals, can create pediatric cancer EHR data sets to address a very wide range of research questions in pediatric oncology. After ExtractEHR data acquisition, EHR data can be cleaned and graded using CleanEHR and GradeEHR, companion software packages.

Results: ExtractEHR has been installed at four leading pediatric institutions: Children's Healthcare of Atlanta, Children's Hospital of Philadelphia, Texas Children's Hospital, and Seattle Children's Hospital.

Conclusion: ExtractEHR has supported multiple use cases, including five clinical epidemiology studies, multicenter clinical trials, and cancer cohort assembly. Work is ongoing to develop Fast Health care Interoperability Resources ExtractEHR and implement other sustainability and scalability enhancements.

MeSH terms

  • Child
  • Data Curation / methods
  • Electronic Health Records*
  • Humans
  • Medical Oncology / methods
  • Neoplasms / epidemiology
  • Software*