Objectives: (1) To develop an automated eligibility screening (ES) approach for clinical trials in an urban tertiary care pediatric emergency department (ED); (2) to assess the effectiveness of natural language processing (NLP), information extraction (IE), and machine learning (ML) techniques on real-world clinical data and trials.
Data and methods: We collected eligibility criteria for 13 randomly selected, disease-specific clinical trials actively enrolling patients between January 1, 2010 and August 31, 2012. In parallel, we retrospectively selected data fields including demographics, laboratory data, and clinical notes from the electronic health record (EHR) to represent profiles of all 202795 patients visiting the ED during the same period. Leveraging NLP, IE, and ML technologies, the automated ES algorithms identified patients whose profiles matched the trial criteria to reduce the pool of candidates for staff screening. The performance was validated on both a physician-generated gold standard of trial-patient matches and a reference standard of historical trial-patient enrollment decisions, where workload, mean average precision (MAP), and recall were assessed.
Results: Compared with the case without automation, the workload with automated ES was reduced by 92% on the gold standard set, with a MAP of 62.9%. The automated ES achieved a 450% increase in trial screening efficiency. The findings on the gold standard set were confirmed by large-scale evaluation on the reference set of trial-patient matches.
Discussion and conclusion: By exploiting the text of trial criteria and the content of EHRs, we demonstrated that NLP-, IE-, and ML-based automated ES could successfully identify patients for clinical trials.
Keywords: Automated Clinical Trial Eligibility Screening; Information Extraction; Machine Learning; Natural Language Processing.
© The Author 2014. Published by Oxford University Press on behalf of the American Medical Informatics Association.