An algorithm to predict data completeness in oncology electronic medical records for comparative effectiveness research

Ann Epidemiol. 2022 Dec:76:143-149. doi: 10.1016/j.annepidem.2022.07.007. Epub 2022 Jul 23.

Abstract

Introduction: Electronic health record (EHR) discontinuity (missing out-of-network encounters) can lead to information bias. We sought to construct an algorithm that identifies high EHR-continuity among oncology patients.

Methods: Using a linked Medicare-EHR database and regression, we sought to 1) measure how often Medicare claims for outpatient encounters were substantiated by visits recorded in the EHR, and 2) predict continuity ratio, defined as the yearly proportion of outpatient encounters reported to Medicare that were captured by EHR data. The prediction model...s performance was evaluated with the coefficient of determination and Spearman...s correlation. We quantified variable misclassification by decile of continuity ratio using standardized difference and sensitivity.

Results: A total of 79,678 subjects met all eligibility criteria. Predicted and observed continuity was highly correlated (σSpearman=0.86). On average across all variables measured, MSD was reduced by a factor of 1/7th and sensitivity was improved 35-fold comparing subjects in the highest vs. lowest decile of CR.

Conclusion: In the oncology population, restricting EHR-based study cohorts to subjects with high continuity may reduce misclassification without greatly impacting representativeness. Further work is needed to elucidate the best manner of implementing continuity prediction rules in cohort studies.

Keywords: Comparative effectiveness research; Continuity; Electronic medical records; Information bias.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Aged
  • Algorithms
  • Comparative Effectiveness Research
  • Electronic Health Records*
  • Humans
  • Medical Oncology
  • Medicare
  • Neoplasms* / epidemiology
  • United States