1. Machine learning and natural language processing (NLP) approach to predict early progression to first-line treatment in real-world hormone receptor-positive (HR+)/HER2-negative advanced breast cancer patients
- Author
-
Bella Pajares, Sofia Ruiz-Medina, Enrique Saez, Antonia Márquez, Laura Galvez, Begoña Jimenez, Ana Godoy, Pablo Rodriguez-Brazzarola, Maria E. Dominguez-Recio, Francisco Carabantes, Alfonso Sánchez-Muñoz, María José Bermejo, Irene López, José M. Jerez, Tamara Diaz-Redondo, Ester Villar, Héctor Mesa, Leo Franco, Nuria Ribelles, and Emilio Alba
- Subjects
Adult ,0301 basic medicine ,Cancer Research ,Receptor, ErbB-2 ,Advanced breast ,Breast Neoplasms ,computer.software_genre ,Machine learning ,Machine Learning ,Young Adult ,03 medical and health sciences ,0302 clinical medicine ,Breast cancer ,Antineoplastic Combined Chemotherapy Protocols ,Electronic Health Records ,Humans ,Medicine ,Aged ,Natural Language Processing ,Retrospective Studies ,Aged, 80 and over ,business.industry ,HER2 negative ,Area under the curve ,Cancer ,Middle Aged ,Prognosis ,medicine.disease ,Metastatic breast cancer ,Survival Rate ,First line treatment ,030104 developmental biology ,Receptors, Estrogen ,Oncology ,Hormone receptor ,030220 oncology & carcinogenesis ,Disease Progression ,Female ,Artificial intelligence ,Receptors, Progesterone ,business ,computer ,Natural language processing ,Follow-Up Studies - Abstract
Background CDK4/6 inhibitors plus endocrine therapies are the current standard of care in the first-line treatment of HR+/HER2-negative metastatic breast cancer, but there are no well-established clinical or molecular predictive factors for patient response. In the era of personalised oncology, new approaches for developing predictive models of response are needed. Materials and methods Data derived from the electronic health records (EHRs) of real-world patients with HR+/HER2-negative advanced breast cancer were used to develop predictive models for early and late progression to first-line treatment. Two machine learning approaches were used: a classic approach using a data set of manually extracted features from reviewed (EHR) patients, and a second approach using natural language processing (NLP) of free-text clinical notes recorded during medical visits. Results Of the 610 patients included, there were 473 (77.5%) progressions to first-line treatment, of which 126 (20.6%) occurred within the first 6 months. There were 152 patients (24.9%) who showed no disease progression before 28 months from the onset of first-line treatment. The best predictive model for early progression using the manually extracted dataset achieved an area under the curve (AUC) of 0.734 (95% CI 0.687–0.782). Using the NLP free-text processing approach, the best model obtained an AUC of 0.758 (95% CI 0.714–0.800). The best model to predict long responders using manually extracted data obtained an AUC of 0.669 (95% CI 0.608–0.730). With NLP free-text processing, the best model attained an AUC of 0.752 (95% CI 0.705–0.799). Conclusions Using machine learning methods, we developed predictive models for early and late progression to first-line treatment of HR+/HER2-negative metastatic breast cancer, also finding that NLP-based machine learning models are slightly better than predictive models based on manually obtained data.
- Published
- 2021
- Full Text
- View/download PDF