Generative transfer learning for measuring plausibility of EHR diagnosis records

Hossein Estiri; Sebastien Vasey; Shawn N Murphy

doi:10.1093/jamia/ocaa215

Generative transfer learning for measuring plausibility of EHR diagnosis records

J Am Med Inform Assoc. 2021 Mar 1;28(3):559-568. doi: 10.1093/jamia/ocaa215.

Authors

Hossein Estiri^{1

2

3}, Sebastien Vasey⁴, Shawn N Murphy^{1

2

3}

Affiliations

¹ Harvard Medical School, Boston, Massachusetts, USA.
² Massachusetts General Hospital, Boston, Massachusetts, USA.
³ Mass General Brigham, Boston, Massachusetts, USA.
⁴ Department of Mathematics, Harvard University, Cambridge, Massachusetts, USA.

Abstract

Objective: Due to a complex set of processes involved with the recording of health information in the Electronic Health Records (EHRs), the truthfulness of EHR diagnosis records is questionable. We present a computational approach to estimate the probability that a single diagnosis record in the EHR reflects the true disease.

Materials and methods: Using EHR data on 18 diseases from the Mass General Brigham (MGB) Biobank, we develop generative classifiers on a small set of disease-agnostic features from EHRs that aim to represent Patients, pRoviders, and their Interactions within the healthcare SysteM (PRISM features).

Results: We demonstrate that PRISM features and the generative PRISM classifiers are potent for estimating disease probabilities and exhibit generalizable and transferable distributional characteristics across diseases and patient populations. The joint probabilities we learn about diseases through the PRISM features via PRISM generative models are transferable and generalizable to multiple diseases.

Discussion: The Generative Transfer Learning (GTL) approach with PRISM classifiers enables the scalable validation of computable phenotypes in EHRs without the need for domain-specific knowledge about specific disease processes.

Conclusion: Probabilities computed from the generative PRISM classifier can enhance and accelerate applied Machine Learning research and discoveries with EHR data.

Keywords: data quality; diagnosis records; electronic health records; generative models; transfer learning.

Publication types

Research Support, N.I.H., Extramural

MeSH terms

Delivery of Health Care
Diagnosis*
Disease / classification*
Electronic Health Records*
Humans
Machine Learning*
Probability
Professional-Patient Relations
Supervised Machine Learning

Grants and funding

R01 HG009174/HG/NHGRI NIH HHS/United States