Data quality deficiencies significantly limit the applicability of real-world data in data-driven medical research. In this study, using an oncological use case, we report and discuss common quality deficiencies in real-world medical datasets, such as missing data, class imbalances, and timeliness issues. We compiled a multi-departmental real-world dataset comprising 13861 cancer cases diagnosed at University Hospital Cologne and examined data quality throughout the data integration process.
Keywords: Data Quality; Germany; Medical Data; Oncology; Real-World Data.