Purpose: Electronic health records (EHRs) present an opportunity to access large stores of data for research, but mapping raw EHR data to clinical phenotypes is complex. We propose adding patient-reported data to the EHR to improve phenotyping performance and describe a retrospective cohort study demonstrating a test case in depressive disorder.
Methods: We compared four EHR-phenotyping methods based on International Classification of Diseases, Ninth Revision, Clinical Modification (ICD-9-CM) codes, medication records, and the Patient Health Questionnaire 9 (PHQ-9) regarding the ability to identify cases with depression and characteristics of patients identified with depression. Our sample included 168,884 patients seen (2007-2013) at our neurological institute. We assessed the diagnostic performance in a subset of 225 patients who had a reference standard measurement available.
Results: ICD-9-CM codes identified the fewest number of patients as depressed (4,658), followed by PHQ-9 (46,565), and medication data (50,505). The presence of at least one of these criteria identified the largest number (78,322). The PHQ-9 identified a higher proportion of elderly, disabled, Medicaid, and rural patients, as compared to ICD-9-CM codes. ICD-9-CM codes were least sensitive (6.7% sensitivity), whereas the method using at least one of the criteria identified the highest number of truly depressed patients (93.3% sensitivity); however, specificity dropped from 97.7 to 58.1%.
Conclusions: The choice of phenotyping method may disproportionately exclude patient groups from research. Patient-reported data hold potential to improve sensitivity while maintaining an acceptable loss of specificity, depending on the context. Researchers should consider including patient-reported data in EHR-driven phenotyping methods.