Studying Privacy Aspects of Learned Knowledge Bases in the Context of Synthetic and Medical Data

Stud Health Technol Inform. 2024 Aug 30:317:261-269. doi: 10.3233/SHTI240866.

Abstract

Introduction: Retrieving comprehensible rule-based knowledge from medical data by machine learning is a beneficial task, e.g., for automating the process of creating a decision support system. While this has recently been studied by means of exception-tolerant hierarchical knowledge bases (i.e., knowledge bases, where rule-based knowledge is represented on several levels of abstraction), privacy concerns have not been addressed extensively in this context yet. However, privacy plays an important role, especially for medical applications.

Methods: When parts of the original dataset can be restored from a learned knowledge base, there may be a practically and legally relevant risk of re-identification for individuals. In this paper, we study privacy issues of exception-tolerant hierarchical knowledge bases which are learned from data. We propose approaches for determining and eliminating privacy issues of the learned knowledge bases.

Results: We present results for synthetic as well as for real world datasets.

Conclusion: The results show that our approach effectively prevents privacy breaches while only moderately decreasing the inference quality.

Keywords: InteKRator; knowledge base; machine learning; privacy; transparency.

MeSH terms

  • Computer Security
  • Confidentiality*
  • Electronic Health Records
  • Humans
  • Knowledge Bases*
  • Machine Learning*
  • Privacy