Back to Search
Start Over
Evaluation of prediction and classification performances in different machine learning models for patient‐specific quality assurance of head‐and‐neck VMAT plans
- Source :
- Medical Physics. 49:727-741
- Publication Year :
- 2021
- Publisher :
- Wiley, 2021.
-
Abstract
- PURPOSE The purpose of this study is to evaluate the prediction and classification performances of the gamma passing rate (GPR) for different machine learning models and to select the best model for achieving machine learning-based patient-specific quality assurance (PSQA). METHODS The measurement verification of 356 head-and-neck volumetric modulated arc therapy plans was performed using a diode array phantom (Delta4 Phantom), and GPR values at 2%/2 mm with global normalization and 3%/2 mm with local normalization were calculated. Machine learning models, including ridge regression (RIDGE), random forest (RF), support vector regression (SVR), and stacked generalization (STACKING), were used to predict the GPR. Each machine learning model was trained using 260 plans, and the prediction accuracy was evaluated using the remaining 96 plans. The prediction error between the measured and predicted GPR was evaluated. For the classification evaluation, the lower control limit for the measured GPR and lower control limit for predicted GPR (LCLp ) was defined to identify whether the GPR values represent a "pass" or a "fail." LCLp values with 99% and 99.9% confidence levels were calculated as the upper prediction limits for the GPR estimated from the linear regression between the measured and predicted GPR. RESULTS There was an overestimation trend of the low measured GPR. The maximum prediction errors for RIDGE, RF, SVR, and STACKING were 3.2%, 2.9%, 2.3%, and 2.2% at global 2%/2 mm and 6.3%, 6.6%, 6.1%, and 5.5% at the local 3%/2 mm, respectively. In the global 2%/2 mm, the sensitivity was 100% for all the machine learning models except RIDGE when using 99% LCLp . The specificity was 76.1% for RIDGE, RF, and SVR and 66.3% for STACKING, however, the specificity decreased dramatically when 99.9% LCLp was used. In the local 3%/2 mm, however, only STACKING showed 100% sensitivity when using 99% LCLp . The decrease in the specificity using 99.9% LCLp was smaller than that in the global 2%/2mm, and the specificity for RIDGE, RF, SVR, and STACKING was 61.3%, 61.3%, 72.0%, and 66.8%, respectively. CONCLUSIONS STACKING had better prediction accuracy for low GPR values than other machine learning models. Applying LCLp to a regression model enabled the consistent evaluation of quantitative and qualitative GPR predictions. Adjusting the confidence level of the LCLp helped improve the balance between the sensitivity and specificity. We suggest that STACKING can assist the safe and efficient operation of PSQA. This article is protected by copyright. All rights reserved.
- Subjects :
- Normalization (statistics)
Phantoms, Imaging
business.industry
Radiotherapy Planning, Computer-Assisted
Radiotherapy Dosage
Regression analysis
General Medicine
Machine learning
computer.software_genre
Regression
Confidence interval
Random forest
Machine Learning
Support vector machine
Gamma Rays
Linear regression
Humans
Radiotherapy, Intensity-Modulated
Artificial intelligence
business
Quality assurance
computer
Mathematics
Subjects
Details
- ISSN :
- 24734209 and 00942405
- Volume :
- 49
- Database :
- OpenAIRE
- Journal :
- Medical Physics
- Accession number :
- edsair.doi.dedup.....5c5199440493255f6ca833a54b3c7bc7
- Full Text :
- https://doi.org/10.1002/mp.15393