The cancer imaging archive (TICA) receives and manages an ever-increasing quantity of clinical (non-image) data containing valuable information about subjects in imaging collections. To harmonize and integrate these data, we have first cataloged the types of information occurring across public TCIA collections. We then produced mappings for these diverse instance data using ontology-based representation patterns and transformed the data into a knowledge graph in a semantic database. This repository combined the transformed instance data with relevant background knowledge from domain ontologies. The resulting repository of semantically integrated data is a rich source of information about subjects that can be queried across imaging collections. Building on this work we have implemented and deployed a REST API and a user-facing semantic cohort builder tool. This tool allows allow researchers and other users to search and identify groups of subject-level records based on non-image data that were not queryable prior to this work. The search results produced by this interface link to images, allowing users to quickly identify and view images matching the selection criteria, as well as allowing users to export the harmonized clinical data.
Keywords: data integration; data presentation; imaging informatics; informatics technology for cancer research; knowledge representation; tool development.
Creative Commons Attribution license.