Extracting social determinants of health from inpatient electronic medical records using natural language processing

J Epidemiol Popul Health. 2024 Dec;72(6):202791. doi: 10.1016/j.jeph.2024.202791. Epub 2024 Nov 14.

Abstract

Background: Social determinants of health (SDOH) have been shown to be important predictors of health outcomes. Here we developed methods to extract them from inpatient electronic medical record (EMR) data using techniques compatible with current EMR systems.

Methods: Four social determinants were targeted: patient language barriers, employment status, education, and whether the patient lives alone. Inpatients aged 18 and older with records in the Calgary-wide EMR system were studied. Algorithms were developed on the January 2019 hospital admissions (n=8,999) and validated on the January 2018 hospital admissions (n=8,839). SDOH documented as structured data were compared against those extracted from unstructured free-text notes.

Results: More than twice as many patients had a note documenting a language barrier in EMR data than in structured data; 12 % of patients indicated by EMR notes to be living alone had a partner noted in their structured marital status. The Positive Predictive Value (PPV) of the elements extracted from notes was high, at 99 % (95 % CI 94.0 %-100.0 %) for language barriers, 98 % (95 % CI 92.6 %-99.9 %) for living alone, 96 % (95 % CI 89.8 %-98.8 %) for unemployment, and 88 % (95 % CI 80.0 %-93.1 %) for retirement.

Conclusions: All SDOH elements were extracted with high PPV. SDOH documentation was largely missing in structured data and sometimes misleading.

Keywords: Case identification; EMR phenotyping; Electronic medical record data; Social determinants of health.

MeSH terms

  • Adolescent
  • Adult
  • Aged
  • Alberta
  • Algorithms
  • Communication Barriers
  • Educational Status
  • Electronic Health Records* / statistics & numerical data
  • Female
  • Hospitalization / statistics & numerical data
  • Humans
  • Inpatients* / statistics & numerical data
  • Male
  • Middle Aged
  • Natural Language Processing*
  • Social Determinants of Health*