Methodological challenges and analytic opportunities for modeling and interpreting Big Healthcare Data

Gigascience. 2016 Feb 25:5:12. doi: 10.1186/s13742-016-0117-6. eCollection 2016.

Abstract

Managing, processing and understanding big healthcare data is challenging, costly and demanding. Without a robust fundamental theory for representation, analysis and inference, a roadmap for uniform handling and analyzing of such complex data remains elusive. In this article, we outline various big data challenges, opportunities, modeling methods and software techniques for blending complex healthcare data, advanced analytic tools, and distributed scientific computing. Using imaging, genetic and healthcare data we provide examples of processing heterogeneous datasets using distributed cloud services, automated and semi-automated classification techniques, and open-science protocols. Despite substantial advances, new innovative technologies need to be developed that enhance, scale and optimize the management and processing of large, complex and heterogeneous data. Stakeholder investments in data acquisition, research and development, computational infrastructure and education will be critical to realize the huge potential of big data, to reap the expected information benefits and to build lasting knowledge assets. Multi-faceted proprietary, open-source, and community developments will be essential to enable broad, reliable, sustainable and efficient data-driven discovery and analytics. Big data will affect every sector of the economy and their hallmark will be 'team science'.

Keywords: Analytics; Big data; Cloud services; Information technology; Modeling; Processing; Visualization; Workflows.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, U.S. Gov't, Non-P.H.S.
  • Review

MeSH terms

  • Computational Biology / methods*
  • Delivery of Health Care / statistics & numerical data*
  • Humans
  • Models, Theoretical*
  • Neuroimaging / statistics & numerical data
  • Principal Component Analysis
  • Reproducibility of Results
  • Software*