Development of two machine learning models to predict conversion from primary HER2-0 breast cancer to HER2-low metastases: a proof-of-concept study

F Miglietta; A Collesei; C Vernieri; T Giarratano; C A Giorgi; F Girardi; G Griguolo; M Cacciatore; A Botticelli; A Vingiani; G Fotia; F Piacentini; D Massa; F Zanghì; M Marino; G Pruneri; M Fassan; A P Dei Tos; M V Dieci; V Guarneri

doi:10.1016/j.esmoop.2024.104087

Development of two machine learning models to predict conversion from primary HER2-0 breast cancer to HER2-low metastases: a proof-of-concept study

ESMO Open. 2024 Dec 19;10(1):104087. doi: 10.1016/j.esmoop.2024.104087. Online ahead of print.

Authors

F Miglietta¹, A Collesei², C Vernieri³, T Giarratano⁴, C A Giorgi⁴, F Girardi⁴, G Griguolo¹, M Cacciatore⁵, A Botticelli⁶, A Vingiani⁷, G Fotia³, F Piacentini⁸, D Massa¹, F Zanghì¹, M Marino¹, G Pruneri⁹, M Fassan¹⁰, A P Dei Tos¹¹, M V Dieci¹², V Guarneri¹

Affiliations

¹ Oncology Unit 2, Istituto Oncologico Veneto (IOV) - IRCCS, Padova, Italy; Department of Surgery, Oncology and Gastroenterology, University of Padova, Padova, Italy.
² Bioinformatics - Clinical Research Unit, Istituto Oncologico Veneto, IOV - IRCCS, Padova, Italy.
³ Medical Oncology Department, Fondazione IRCCS Istituto Nazionale dei Tumori (INT), Milan, Italy; Oncology and Hemato-Oncology, Department University of Milan, Milan, Italy.
⁴ Oncology Unit 2, Istituto Oncologico Veneto (IOV) - IRCCS, Padova, Italy.
⁵ Pathology Unit, ULSS 9 - Treviso-Azienda ULSS 2 Marca Trevigiana, Treviso, Italy.
⁶ Department of Radiological, Oncological and Pathological Science, Policlinico Umberto I, "Sapienza" University of Rome, Rome, Italy.
⁷ Oncology and Hemato-Oncology, Department University of Milan, Milan, Italy; Department of Advanced Diagnostics, Fondazione IRCCS Istituto Nazionale dei Tumori, Milan, Italy.
⁸ Department of Medical and Surgical Sciences for Children and Adults, University Hospital of Modena, Modena, Italy.
⁹ Medical Oncology Department, Fondazione IRCCS Istituto Nazionale dei Tumori (INT), Milan, Italy; Department of Advanced Diagnostics, Fondazione IRCCS Istituto Nazionale dei Tumori, Milan, Italy.
¹⁰ Istituto Oncologico Veneto (IOV) - IRCCS, Padova, Italy; Pathology Unit, Azienda Universitaria Ospedaliera di Padova, Padova, Italy.
¹¹ Pathology Unit, Azienda Universitaria Ospedaliera di Padova, Padova, Italy.
¹² Oncology Unit 2, Istituto Oncologico Veneto (IOV) - IRCCS, Padova, Italy; Department of Surgery, Oncology and Gastroenterology, University of Padova, Padova, Italy. Electronic address: mariavittoria.dieci@unipd.it.

Abstract

Background: HER2-low expression has gained clinical relevance in breast cancer (BC) due to the availability of anti-HER2 antibody-drug conjugates for patients with HER2-low metastatic BC. The well-reported instability of HER2-low status during disease evolution highlights the need to identify patients with HER2-0 primary BC who may develop a HER2-low phenotype at relapse. In response to the urgency of maximizing treatment access, we utilized artificial intelligence to predict this occurrence.

Patients and methods: We included a large multicentric retrospective cohort of patients with BC who underwent tissue resampling at relapse. The dataset was preprocessed to address relevant issues such as missing data, feature abundance, and target class imbalance. We then trained two models: one focused on explainability [Extreme Gradient Boosting (XGBoost)] and another aimed at performance (an ensemble of XGBoost and support vector machine).

Results: A total of 1200 patients were included in this study. Among 386 patients with HER2-0 primary BC and matched HER2 status at relapse, 42.5% (n = 157) converted to a HER2-low phenotype. The explainable model achieved a balanced accuracy of 58%, with a sensitivity of 53% and a specificity of 64%. The most important variables for this model were primary BC phenotype [mean Shapley value (SHAP) 0.540], primary BC histological type (SHAP 0.101), grade (SHAP 0.182), and sites of relapse (SHAP 0.008-0.213). The ensemble model had a balanced accuracy of 64%, with a sensitivity of 75% and a specificity of 53%.

Conclusions: This work represents one of the first proof-of-concept applications of machine learning models to predict a highly relevant phenomenon for drug access in modern BC oncology. Starting with an explainable model and subsequently integrating it with an ensemble approach enabled us to enhance performance while maintaining transparency, explainability, and intelligibility.

Keywords: HER2; breast cancer; explainability; machine learning.