Erina Odani, Tetsu Hayashida, masayuki kikuchi, Aiko Nagayama, tomoko seki, maiko takahashi, Akiko Matsumoto, Takeshi Murata, Rurina Watanuki, Takamichi Yokoe, Ayako Nakashoji, Hinako Maeda, Tatsuya Onishi, Sota Asaga, Takashi Hojo, Hiromitsu Jinno, Keiichi Sotome, Akira Matsui, Akihiko Suto, Shigeru Imoto, and Yuko Kitagawa
Although the categorization of ultrasound using the Breast Imaging Reporting and Data System (BI-RADS) has become widespread worldwide, the problem of inter-observer variability remains. To maintain uniformity in diagnostic accuracy, we have developed a novel artificial intelligence (AI) system in which AI can distinguish whether a static image obtained using a breast ultrasound represents BI-RADS3 or lower, or BI-RADS4a or higher, to determine the medical management that should be performed on a patient whose breast ultrasound shows abnormalities. To establish and validate the AI system, a training dataset consisting of 4,028 images containing 5,014 lesions and a test dataset consisting of 3,166 images containing 3,656 lesions were collected and annotated. We selected a setting that maximized the area under the curve (AUC) and minimized the difference in sensitivity and specificity by adjusting the internal parameters of the AI system, achieving an AUC, sensitivity, and specificity of 0.95, 90.0%, and 88.5%, respectively. Furthermore, based on 30 images extracted from the test data, the diagnostic accuracy of 20 clinicians and the AI system was compared, and the AI system was found to be significantly superior to the clinicians (McNemar test, p < 0.001). Then, we conducted a trial to introduce the system for use in clinical practice. Physicians reviewed the images and determined whether they were BI-RADS3 or lower, or BI-RADS4a or higher. Next, the classification was performed again for the same images concerning the AI diagnosis. At this time, the initial judgment was allowed to be overturned. We checked whether there was any difference in the diagnostic accuracy, sensitivity, and specificity before and after reviewing to the AI diagnosis. Reviews by 24 physicians were evaluated: 4 Japanese Breast Cancer Society breast specialists, 5 non-specialists and physicians with experience treating more than 40 cases of breast cancer, and 15 non-specialists and physicians with no experience treating more than 40 cases of breast cancer. The average rate of accuracy before confirming the AI diagnosis increased to 73.1% after confirming the AI diagnosis (p=0.00548), compared to 69.3% on average before the AI diagnosis. Compared to practice experience, the accuracy increased from an average of 77.1% to 79.6% for the 9 physicians who were breast specialists or who had treated 40 or more cases of breast cancer. For the 15 physicians with less than 40 breast cancer cases, the average rate of accuracy increased from 64.7% to 69.2%. Furthermore, sensitivity increased significantly to an average of 99.7% after reviewing of the AI diagnosis from an average of 88.8% prior to reviewing the AI-diagnosis.(p< 0.01). Specificity increased from an average of 62.4% to 63.8% (p=0.433) after reviewing AI diagnosis. We showed that our AI system, when applied to clinical practice and used by physicians, contributes to the improvement of diagnostic accuracy. Our results indicated that our AI diagnostic system was sufficiently accurate to be used in the clinical practice. Citation Format: Erina Odani, Tetsu Hayashida, masayuki kikuchi, Aiko Nagayama, tomoko seki, maiko takahashi, Akiko Matsumoto, Takeshi Murata, Rurina Watanuki, Takamichi Yokoe, Ayako Nakashoji, Hinako Maeda, Tatsuya Onishi, Sota Asaga, Takashi Hojo, Hiromitsu Jinno, Keiichi Sotome, Akira Matsui, Akihiko Suto, Shigeru Imoto, Yuko Kitagawa. Establishment of the breast ultrasound support system using deep-learning system [abstract]. In: Proceedings of the 2022 San Antonio Breast Cancer Symposium; 2022 Dec 6-10; San Antonio, TX. Philadelphia (PA): AACR; Cancer Res 2023;83(5 Suppl):Abstract nr P1-05-06.