Background: Literature on how to translate information extracted from clinical progress notes into numeric scores for 3-step theory of suicide (3ST) factors is nonexistent. We determined which scoring option would best discriminate between patients who will attempt or die by suicide and patients with neither suicidal ideation nor attempts, and we tested hypotheses related to the 3ST.
Methods: We used terminology-driven natural language processing (NLP) to extract information from Veterans Health Administration (VHA) clinical progress notes. Counts of those extractions served as input to evaluate candidate scoring options for each 3ST factor (psychological pain, hopelessness, connectedness, capability for suicide). Logistic regression models adjusted for common demographic characteristics were used to test the 3ST hypotheses.
Results: Optimal contrasts between groups were obtained with P - A for psychological pain, hopelessness, and capability for suicide, and for connectedness, where P and A, respectively, indicate the patient-level number of extractions indicating presence and absence of the factor.
Limitations: Additional research is necessary to verify whether our conclusions hold in a cohort that is more reflective of the general VHA population.
Conclusion: Terminology-driven 3ST factor scores discriminate patients who attempt or die by suicide from patients without suicidal ideation or attempts. Our results corroborate the validity of the 3ST for VHA patients.
Keywords: 3‐step theory of suicide; controlled vocabulary; electronic health records; natural language processing; psychological pain; suicide; veterans.
Published 2025. This article is a U.S. Government work and is in the public domain in the USA.