Aim: Administrative data are increasingly being linked with other data sources for research purposes in the field of epidemiology and health services research abroad. In Germany, the direct linkage of routine data of statutory health insurance (SHI) providers with other data sources is complicated due to strict data protection requirements. The aim of this analysis was to evaluate an indirect linkage of SHI routine data with data of a hospital information system (HIS).
Methods: The dataset comprised data from 2004 to 2010 from 2 sickness funds and one HIS. In both data sources, hospitalisations were restricted to admissions into one hospital with at least one diagnosis of heart failure. The 2 data sources were linked, in cases of the agreement of the admission and discharge dates, as well as the agreement of at least a certain percentage of diagnoses in HIS data when compared to SHI data (full coding depth). Based on the direct linkage using the pseudonymised insurance number as gold standard, the proposed linkage approach was evaluated by means of test statistics. Furthermore, the completeness of relevant information of the HIS was described.
Results: The dataset contained 3 731 hospitalisations from the HIS and 8 172 hospitalisations from the SHI routine data. The sensitivity of the linkage approach was 86.7% in the case of an agreement of at least 30% of the diagnoses and decreased to 41.7% in the case of 100% agreement in the diagnoses. The specificity was almost 100% at all studied cut-offs of agreement. Anthropometric measures and diagnostic information were available only for a small fraction of cases in the data of the HIS, whereas information on the health status and on laboratory information was comparatively complete.
Conclusion: For the linkage of SHI routine data with complementary data sources, indirect linkage methods can be a valuable alternative in comparison to direct linkage, which is time-consuming with regard to planning and application. Since the proposed approach was used in a relatively small sample and a restricted patient population, a replication using nation-wide data without respective restrictions would require an extension of the algorithm. Furthermore, the large administrative effort seems questionable considering the comparatively high amount of missing values in interesting information in the HIS.
© Georg Thieme Verlag KG Stuttgart · New York.