Background: The success of multisite collaborative research relies on effective data collection, harmonization, and aggregation strategies. Data Coordination Centers (DCC) serve to facilitate the implementation of these strategies. The utility of a DCC can be particularly relevant for research on rare diseases where collaboration from multiple sites to amass large aggregate datasets is essential. However, approaches to building a DCC have been scarcely documented.
Methods: The Li-Fraumeni Exploration (LiFE) Consortium's DCC was created using multiple open source packages, including LAM/G Application (Linux, Apache, MySQL, Grails), Extraction-Transformation-Loading (ETL) Pentaho Data Integration Tool, and the Saiku-Mondrian client. This document serves as a resource for building a rare disease DCC for multi-institutional collaborative research.
Results: The primary scientific and technological objective to create an online central repository into which data from all participating sites could be deposited, harmonized, aggregated, disseminated, and analyzed was completed. The cohort now include 2,193 participants from six contributing sites, including 1,354 individuals from families with a pathogenic or likely variant in TP53. Data on cancer diagnoses are also available. Challenges and lessons learned are summarized.
Conclusions: The methods leveraged mitigate challenges associated with successfully developing a DCC's technical infrastructure, data harmonization efforts, communications, and software development and applications.
Impact: These methods can serve as a framework in establishing other collaborative research efforts. Data from the consortium will serve as a great resource for collaborative research to improve knowledge on, and the ability to care for, individuals and families with Li-Fraumeni syndrome.
©2020 American Association for Cancer Research.