1. Developing a Prediction Model for Pathologic Complete Response Following Neoadjuvant Chemotherapy in Breast Cancer: A Comparison of Model Building Approaches
- Author
-
Robert B. Basmadjian, Shiying Kong, Devon J. Boyne, Tamer N. Jarada, Yuan Xu, Winson Y. Cheung, Sasha Lupichuk, May Lynn Quan, and Darren R. Brenner
- Subjects
Machine Learning ,ROC Curve ,Humans ,Breast Neoplasms ,Female ,Breast ,General Medicine ,Neoadjuvant Therapy - Abstract
PURPOSE The optimal characteristics among patients with breast cancer to recommend neoadjuvant chemotherapy is an active area of clinical research. We developed and compared several approaches to developing prediction models for pathologic complete response (pCR) among patients with breast cancer in Alberta. METHODS The study included all patients with breast cancer who received neoadjuvant chemotherapy in Alberta between 2012 and 2014 identified from the Alberta Cancer Registry. Patient, tumor, and treatment data were obtained through primary chart review. pCR was defined as no residual invasive tumor at surgical excision in breast or axilla. Two types of prediction models for pCR were built: (1) expert model: variables selected on the basis of oncologists' opinions and (2) data-driven model: variables selected by trained machine. These model types were fit using logistic regression (LR), random forests (RF), and gradient-boosted trees (GBT). We compared the models using area under the receiver operating characteristic curve and integrated calibration index, and internally validated using bootstrap resampling. RESULTS A total of 363 cases were included in the analyses, of which 86 experienced pCR. The RF and GBT fits yielded higher optimism-corrected area under the receiver operating characteristic curves compared with LR for the expert (RF: 0.70; GBT: 0.69; LR: 0.65) and data-driven models (RF: 0.71; GBT: 0.68; LR: 0.64). The LR fit yielded the lowest integrated calibration indices for the expert (LR: 0.037; GBT: 0.05; RF: 0.10) and data-driven models (LR: 0.026; GBT: 0.06; RF: 0.099). CONCLUSION Our models demonstrated predictive ability for pCR using routinely collected clinical and demographic variables. We show that machine learning fit methods can be used to optimize models for pCR prediction. We also show that additional variables beyond clinical expertise do not considerably improve predictive ability and may not be of value on the basis of the burden of data collection.
- Published
- 2022