Objectives: Examine the accuracy of privacy preserving record linkage (PPRL) matches in real world data (RWD).
Materials and methods: We conducted a systematic literature review to identify articles evaluating PPRL methods from January 1, 2013 to June 15, 2023. Eligible studies included original research reporting quantitative metrics such as precision and recall in health-related data sources. Covidence software was used to manage the review process.
Results: Five studies met our inclusion criteria. Tokenization and hash functions were used to hash and encrypt personally identifiable information (PII) including first and last names, dates of birth (DOB), and Social Security Numbers (SSNs) in a variety of RWD. All identified studies utilized deterministic matching. Combinations of tokenized or hashed PII that included "quasi-identifiers" like names and DOBs had consistently high precision (>95%) but lower recall, likely due to misspelled or inconsistently spelled names and name changes. SSN-based combinations demonstrated high precision but variable recall due to incomplete SSN data in RWD. Studies that employed algorithms in which at least one match was identified from a specified set of PII combinations provided high precision and high recall.
Discussion: The systematic review indicates that PPRL methods generally provide highly accurate patient data linkage while maintaining privacy.
Conclusions: Researchers should carefully consider the completeness and stability of each PII element selected for PPRL and may want to employ a strategy that allows for patient records to be matched if they meet at least one of several combinations of PII.
Keywords: administrative claims; data anonymization; electronic health records; healthcare; personally identifiable information; privacy preserving record linkage.
© The Author(s) 2025. Published by Oxford University Press on behalf of the American Medical Informatics Association.