1. A New Classification of Benign, Premalignant, and Malignant Endometrial Tissues Using Machine Learning Applied to 1413 Candidate Variables.
- Author
-
Downing MJ, Papke DJ Jr, Tyekucheva S, and Mutter GL
- Subjects
- Cohort Studies, Endometrial Hyperplasia pathology, Endometrial Neoplasms pathology, Endometrium pathology, Epithelial Cells pathology, Female, Humans, Image Processing, Computer-Assisted, Models, Statistical, Precancerous Conditions pathology, Workflow, Algorithms, Endometrial Hyperplasia classification, Endometrial Neoplasms classification, Machine Learning, Precancerous Conditions classification
- Abstract
Benign normal (NL), premalignant (endometrial intraepithelial neoplasia, EIN) and malignant (cancer, EMCA) endometria must be precisely distinguished for optimal management. EIN was objectively defined previously as a regression model incorporating manually traced histologic variables to predict clonal growth and cancer outcomes. Results from this early computational study were used to revise subjective endometrial precancer diagnostic criteria currently in use. We here use automated feature segmentation and updated machine learning algorithms to develop a new classification algorithm. Endometrial tissue from 148 patients was randomly separated into 72-patient training and 76-patient validation cohorts encompassing all 3 diagnostic classes. We applied image analysis software to keratin stained endometrial tissues to automatically segment whole-slide digital images into epithelium, cells, and nuclei and extract corresponding variables. A total of 1413 variables were culled to 75 based on random forest classification performance in a 3-group (NL, EIN, EMCA) model. This algorithm correctly classifies cases with 3-class error rates of 0.04 (training set) and 0.058 (validation set); and 2-class (NL vs. EIN+EMCA) error rate of 0.016 (training set) and 0 (validation set). The 4 most heavily weighted variables are surrogates of those previously identified in manual-segmentation machine learning studies (stromal and epithelial area percentages, and normalized epithelial surface lengths). Lesser weighted predictors include gland and lumen axis lengths and ratios, and individual cell measures. Automated image analysis and random forest classification algorithms can classify normal, premalignant, and malignant endometrial tissues. Highest predictive variables overlap with those discovered independently in early models based on manual segmentation.
- Published
- 2020
- Full Text
- View/download PDF