The paradigm of evidence-based medicine requires that medical decisions are made on the basis of the best available knowledge published in the literature. Existing evidence is often summarized in the form of systematic reviews and/or meta-reviews and is rarely available in a structured form. Manual compilation and aggregation is costly, and conducting a systematic review represents a high effort. The need to aggregate evidence arises not only in the context of clinical trials, but is also important in the context of pre-clinical animal studies. In this context, evidence extraction is important to support translation of the most promising pre-clinical therapies into clinical trials or to optimize clinical trial design. Aiming at developing methods that facilitate the task of aggregating evidence published in pre-clinical studies, in this paper a new system is presented that automatically extracts structured knowledge from such publications and stores it in a so-called domain knowledge graph. The approach follows the paradigm of model-complete text comprehension by relying on guidance from a domain ontology creating a deep relational data-structure that reflects the main concepts, protocol, and key findings of studies. Focusing on the domain of spinal cord injuries, a single outcome of a pre-clinical study is described by up to 103 outcome parameters. Since the problem of extracting all these variables together is intractable, we propose a hierarchical architecture that incrementally predicts semantic sub-structures according to a given data model in a bottom-up fashion. At the heart of our approach is a statistical inference method that relies on conditional random fields to infer the most likely instance of the domain model given the text of a scientific publication as input. This approach allows modeling dependencies between the different variables describing a study in a semi-joint fashion. We present a comprehensive evaluation of our system to understand the extent to which our system can capture a study in the depth required to enable the generation of new knowledge. We conclude the article with a brief description of some applications of the populated knowledge graph and show the potential implications of our work for supporting evidence-based medicine.
Keywords: Deep knowledge graph population; Information extraction; Pre-clinical outcomes; Spinal cord injury; Structured prediction.
Copyright © 2023 Elsevier B.V. All rights reserved.