Case reports unlocked: Harnessing large language models to advance research on child maltreatment

Child Abuse Negl. 2024 Dec 16:160:107202. doi: 10.1016/j.chiabu.2024.107202. Online ahead of print.

Abstract

Background: Research on child protective services (CPS) is impeded by a lack of high-quality structured data. Crucial information on cases is often documented in case files, but only in narrative form. Researchers have applied automated language processing to extract structured data from these narratives, but this has been limited to classification tasks of fairly low complexity. Large language models (LLMs) may work for more challenging tasks.

Objective: We aimed to extract structured data from narrative casework reports by applying LLMs to distinguish between different subtypes of violence: child sexual abuse, child physical abuse, a child witnessing domestic violence, and a child being physically aggressive.

Methods: We developed a four-stage pipeline comprising of (1) text segmentation, (2) text segment classification, and subsequent labeling of (3) casework reports, and (4) cases. All CPS reports (N = 29,770) between 2008 and 2022 from Switzerland's largest CPS provider were collected. 28,223 text segments were extracted based on pre-defined keywords. Two human reviewers annotated random samples of text segments and reports for training and validation. Model performance was compared against human-coded test data.

Results: The best-performing LLM (Mixtral-8x7B) classified text segments with an accuracy of 87 %, outperforming agreement between the two human reviewers (77 %). The model also correctly labelled casework reports with an accuracy of 87 %, but only when disregarding non-extracted text segments in stage (1).

Conclusions: LLMs can replicate human coding of text documents even for highly complex tasks that require contextual information. This may considerably advance research on CPS. Transparency can be achieved by backtracking labeling decisions to individual text segments. Keyword-based text segmentation was identified as a weak point, and the potential for bias that may occur at several stages of the process requires attention.

Keywords: Child maltreatment; Domestic violence; Explainable AI; Large language models; Natural language processing; Physical abuse; Sexual abuse.