Secure Secondary Use of Clinical Data with Cloud-based NLP Services. Towards a Highly Scalable Research Infrastructure

Methods Inf Med. 2015;54(3):276-82. doi: 10.3414/ME13-01-0133. Epub 2014 Nov 7.

Abstract

Objectives: The secondary use of clinical data provides large opportunities for clinical and translational research as well as quality assurance projects. For such purposes, it is necessary to provide a flexible and scalable infrastructure that is compliant with privacy requirements. The major goals of the cloud4health project are to define such an architecture, to implement a technical prototype that fulfills these requirements and to evaluate it with three use cases.

Methods: The architecture provides components for multiple data provider sites such as hospitals to extract free text as well as structured data from local sources and de-identify such data for further anonymous or pseudonymous processing. Free text documentation is analyzed and transformed into structured information by text-mining services, which are provided within a cloud-computing environment. Thus, newly gained annotations can be integrated along with the already available structured data items and the resulting data sets can be uploaded to a central study portal for further analysis.

Results: Based on the architecture design, a prototype has been implemented and is under evaluation in three clinical use cases. Data from several hundred patients provided by a University Hospital and a private hospital chain have already been processed.

Conclusions: Cloud4health has shown how existing components for secondary use of structured data can be complemented with text-mining in a privacy compliant manner. The cloud-computing paradigm allows a flexible and dynamically adaptable service provision that facilitates the adoption of services by data providers without own investments in respective hardware resources and software tools.

Keywords: Cloud-computing; natural language processing; privacy; secondary use; software design; text-mining.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Cloud Computing*
  • Data Mining
  • Humans
  • Internet
  • Medical Informatics*
  • Natural Language Processing*
  • Privacy*
  • Software Design