Back to Search
Start Over
Supervised Machine Learning Algorithms Can Classify Open-Text Feedback of Doctor Performance With Human-Level Accuracy
- Source :
- Journal of Medical Internet Research
- Publication Year :
- 2016
-
Abstract
- Background: Machine learning techniques may be an effective and efficient way to classify open-text reports on doctor’s activity for the purposes of quality assurance, safety, and continuing professional development. Objective: The objective of the study was to evaluate the accuracy of machine learning algorithms trained to classify open-text reports of doctor performance and to assess the potential for classifications to identify significant differences in doctors’ professional performance in the United Kingdom. Methods: We used 1636 open-text comments (34,283 words) relating to the performance of 548 doctors collected from a survey of clinicians’ colleagues using the General Medical Council Colleague Questionnaire (GMC-CQ). We coded 77.75% (1272/1636) of the comments into 5 global themes (innovation, interpersonal skills, popularity, professionalism, and respect) using a qualitative framework. We trained 8 machine learning algorithms to classify comments and assessed their performance using several training samples. We evaluated doctor performance using the GMC-CQ and compared scores between doctors with different classifications using t tests. Results: Individual algorithm performance was high (range F score=.68 to .83). Interrater agreement between the algorithms and the human coder was highest for codes relating to “popular” (recall=.97), “innovator” (recall=.98), and “respected” (recall=.87) codes and was lower for the “interpersonal” (recall=.80) and “professional” (recall=.82) codes. A 10-fold cross-validation demonstrated similar performance in each analysis. When combined together into an ensemble of multiple algorithms, mean human-computer interrater agreement was .88. Comments that were classified as “respected,” “professional,” and “interpersonal” related to higher doctor scores on the GMC-CQ compared with comments that were not classified (P.05). Conclusions: Machine learning algorithms can classify open-text feedback of doctor performance into multiple themes derived by human raters with high performance. Colleague open-text comments that signal respect, professionalism, and being interpersonal may be key indicators of doctor’s performance.
- Subjects :
- 020205 medical informatics
Health Informatics
feedback
02 engineering and technology
Interpersonal communication
Machine learning
computer.software_genre
03 medical and health sciences
0302 clinical medicine
Social skills
Physicians
Surveys and Questionnaires
0202 electrical engineering, electronic engineering, information engineering
Medicine
Humans
030212 general & internal medicine
Original Paper
Recall
business.industry
data mining
work performance
Popularity
Inter-rater reliability
machine learning
Key (cryptography)
Artificial intelligence
Clinical Competence
Supervised Machine Learning
F1 score
business
Quality assurance
Algorithm
computer
Algorithms
Subjects
Details
- ISSN :
- 14388871 and 14394456
- Volume :
- 19
- Issue :
- 3
- Database :
- OpenAIRE
- Journal :
- Journal of medical Internet research
- Accession number :
- edsair.doi.dedup.....aab14213a0ee5d72f8b16ea16b148636