STEED: A data mining tool for automated extraction of experimental parameters and risk of bias items from in vivo publications

PLoS One. 2024 Nov 26;19(11):e0311358. doi: 10.1371/journal.pone.0311358. eCollection 2024.

Abstract

Background and methods: Systematic reviews, i.e., research summaries that address focused questions in a structured and reproducible manner, are a cornerstone of evidence-based medicine and research. However, certain steps in systematic reviews, such as data extraction, are labour-intensive, which hampers their feasibility, especially with the rapidly expanding body of biomedical literature. To bridge this gap, we aimed to develop a data mining tool in the R programming environment to automate data extraction from neuroscience in vivo publications. The function was trained on a literature corpus (n = 45 publications) of animal motor neuron disease studies and tested in two validation corpora (motor neuron diseases, n = 31 publications; multiple sclerosis, n = 244 publications).

Results: Our data mining tool, STEED (STructured Extraction of Experimental Data), successfully extracted key experimental parameters such as animal models and species, as well as risk of bias items like randomization or blinding, from in vivo studies. Sensitivity and specificity were over 85% and 80%, respectively, for most items in both validation corpora. Accuracy and F1-score were above 90% and 0.9 for most items in the validation corpora, respectively. Time savings were above 99%.

Conclusions: Our text mining tool, STEED, can extract key experimental parameters and risk of bias items from the neuroscience in vivo literature. This enables the tool's deployment for probing a field in a research improvement context or replacing one human reader during data extraction, resulting in substantial time savings and contributing towards the automation of systematic reviews.

MeSH terms

  • Animals
  • Bias
  • Data Mining* / methods
  • Humans
  • Multiple Sclerosis
  • Publications
  • Software

Grants and funding

This work was supported by grants of the Swiss National Science Foundation (No. 407940_206504, to BVI), the UZH Alumni (to BVI), and the Intramural Research Program of NINDS. We thank all our funders for their support. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.