Developing predictive precision medicine models by exploiting real-world data using machine learning methods

J Appl Stat. 2024 Feb 13;51(14):2980-3003. doi: 10.1080/02664763.2024.2315451. eCollection 2024.

Abstract

Computational Medicine encompasses the application of Statistical Machine Learning and Artificial Intelligence methods on several traditional medical approaches, including biochemical testing which is extremely valuable both for early disease prognosis and long-term individual monitoring, as it can provide important information about a person's health status. However, using Statistical Machine Learning and Artificial Intelligence algorithms to analyze biochemical test data from Electronic Health Records requires several preparatory steps, such as data manipulation and standardization. This study presents a novel approach for utilizing Electronic Health Records from large, real-world databases to develop predictive precision medicine models by exploiting Artificial Intelligence. Furthermore, to demonstrate the effectiveness of this approach, we compare the performance of various traditional Statistical Machine Learning and Deep Learning algorithms in predicting individuals' future biochemical test outcomes. Specifically, using data from a large real-world database, we exploit a longitudinal format of the data in order to predict the future values of 15 biochemical tests and identify individuals at high risk. The proposed approach and the extensive model comparison contribute to the personalized approach that modern medicine aims to achieve.

Keywords: 68T09; 92C50; Predictive precision medicine; big data; biochemical testing; electronic health records; real-world data; statistical machine learning.