Objective: To compare the ability of an artificial neural network (ANN) to predict hospital mortality with that of the Acute Physiology and Chronic Health Evaluation II (APACHE II) system and multiple logistic regression (LR). A secondary objective was to compare the allocation of individual probability among the models.
Method: The variables required for calculating the APACHE II were prospectively collected. A total of 1146 patients were divided (randomly 70% and 30%) into the Development (800) and the Validation (346) sets. With the same variables an LR model and an ANN were carried out (a 3-layer perceptron trained by algorithm backpropagation with bootstrap resampling and with 9 nodes in the hidden layer) in the Development set. The models developed were contrasted with the Validation set and their discrimination properties were evaluated using the area under the ROC curve (AUC [95% CI]) and calibration with the Hosmer-Lemeshow C (HLC) test. Differences between the probabilities were evaluated using the Bland-Altman test.
Results: The Validation set showed an APACHE II with an AUC = 0.79 (0.75-0.84) and HLC = 11 (p = 0.329); LR model AUC = 0.81 (0.76-0.85) and HLC = 29 (p = 0.0001) and an ANN AUC = 0.82 (0.77-0.86) and HLC = 10 (p = 0.404). The patients with the most important differences in the allocation of probability between LR and ANN (8% of the total) were neurological. The worst results were found in trauma patients with an AUC of not greater than 0.75 in all the models. In respiratory patients, the ANN achieved the best AUC = 0.87 (0.78-0.91).
Conclusions: The ANN was able to stratify hospital mortality risk by using the APACHE II system variables. The ANN tended to achieve better results than LR, since, in order to work, it does not require lineal restrictions or independent variables. Allocation of individual probability differed in each model.