Evaluating generalizability of oncology trial results to real-world patients using machine learning-based trial emulations

Nat Med. 2025 Jan 3. doi: 10.1038/s41591-024-03352-5. Online ahead of print.

Abstract

Randomized controlled trials (RCTs) evaluating anti-cancer agents often lack generalizability to real-world oncology patients. Although restrictive eligibility criteria contribute to this issue, the role of selection bias related to prognostic risk remains unclear. In this study, we developed TrialTranslator, a framework designed to systematically evaluate the generalizability of RCTs for oncology therapies. Using a nationwide database of electronic health records from Flatiron Health, this framework emulates RCTs across three prognostic phenotypes identified through machine learning models. We applied this approach to 11 landmark RCTs that investigated anti-cancer regimens considered standard of care for the four most prevalent advanced solid malignancies. Our analyses reveal that patients in low-risk and medium-risk phenotypes exhibit survival times and treatment-associated survival benefits similar to those observed in RCTs. In contrast, high-risk phenotypes show significantly lower survival times and treatment-associated survival benefits compared to RCTs. Our results were corroborated by a comprehensive robustness assessment, including examinations of specific patient subgroups, holdout validation and semi-synthetic data simulation. These findings suggest that the prognostic heterogeneity among real-world oncology patients plays a substantial role in the limited generalizability of RCT results. Machine learning frameworks may facilitate individual patient-level decision support and estimation of real-world treatment benefits to guide trial design.