351. An AUC-maximizing classifier for skewed and partially labeled data with an application in clinical prediction modeling.
- Author
-
Wang, Guanjin, Kwok, Stephen Wai Hang, Axford, Daniel, Yousufuddin, Mohammed, and Sohel, Ferdous
- Abstract
Partially labeled and skewed datasets are common in many applications including healthcare, due to the high costs and time constraints of data collection and annotation. However, training machine learning classifiers on such data can undermine their prediction performances. In this paper, we propose a novel classifier to address this problem by focusing on the Area Under the Curve (AUC), which is widely recognized as a more robust performance metric for skewed datasets than other metrics such as accuracy and error rate. We introduce a new classifier called PSVM-AUC Maximizer (PSVM-AUCMax) which is based on Proximal Support Vector Machines (PSVM) and directly maximizes a new AUC-based metric in its learning objective. PSVM-AUCMax has several merits. First, by directly integrating the maximization of the proposed AUC-based metric, PSVM-AUCMax can be proved to have the enhanced generalization capability on the partially labeled and skewed dataset. Second, it simplifies the model selection process with fewer tuning hyperparameters. Third, PSVM-AUCMax's analytical solution remains the same form as traditional PSVM, preserving its advantages such as fast incremental updating in incremental learning scenarios. The efficacy of PSVM-AUCMax has been demonstrated through extensive experiments on several public datasets and a healthcare case study using data collected at the US Mayo Clinic. In the healthcare case study, we utilized PSVM-AUCMax to develop a clinical prediction model for forecasting composite outcomes in hospitalized COVID-19 patients which yielded promising results. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF