Data integration by multi-tuning parameter elastic net regression

BMC Bioinformatics. 2018 Oct 10;19(1):369. doi: 10.1186/s12859-018-2401-1.

Abstract

Background: To integrate molecular features from multiple high-throughput platforms in prediction, a regression model that penalizes features from all platforms equally is commonly used. However, data from different platforms are likely to differ in effect sizes, the proportion of predictive features, and correlations structures. Subtle but important features may be missed by shrinking all features equally.

Results: We propose an Elastic net (EN) model with separate tuning parameter penalties for each platform that is fit using standard software. In a comprehensive simulation study, we evaluated the performance of EN logistic regression with multiple tuning penalties. We found that when the number of informative features differs among the platforms, and when there is no notable correlation between the features from different platforms, the multi-tuning parameter EN yields more predictive models. Moreover, the multi-tuning parameter EN is robust, in the sense that there is no loss of predictivity relative to a single tuning parameter EN when features across all platforms have similar effects. We also investigated the performance of multi-tuning parameter EN using real cancer datasets.

Conclusion: The proposed multi-tuning parameter EN model, fit using standard penalized regression software, can achieve better prediction in sample classification when integrating multiple genomic platforms, compared to the traditional method where a single penalty parameter is used for all features in different platforms.

Keywords: Classification; Data integration; Elastic net.

MeSH terms

  • Data Analysis*
  • Genomics / methods*
  • Humans
  • Logistic Models
  • Software