Ugur Halici, Rishab Gargeya, Quincy Wong, Hady Ahmady Phoulady, David Tellez, Bram van Ginneken, Andrew H. Beck, Nico Karssemeijer, Jeroen van der Laak, Nassir Navab, Jonas Annuscheit, Leena Latonen, Kaisa Liimatainen, Talha Qaiser, Dayong Wang, Quirine F. Manson, Aoxiao Zhong, Shigeto Seno, Yee-Wah Tsang, Rui Venâncio, Ismael Serrano, Daniel Racoceanu, N. Stathonikos, Muhammad Shaban, Stefanie Demirci, M. Milagro Fernández-Carrobles, Babak Ehteshami Bejnordi, Matt Berseth, Mustafa Umit Oner, Geert Litjens, Kimmo Kartasalo, Hideo Matsuda, Maschenka Balkenhol, Huangjing Lin, Elia Bruni, Hao Chen, Seiryo Watanabe, A. Kalinovsky, Marcory C. R. F. van Dijk, Ami George, Nasir M. Rajpoot, Francisco Beca, Quanzheng Li, Meyke Hermsen, Mira Valkonen, Oscar Deniz, Alexei Vylegzhanin, Vitali Liauchuk, Ruqayya Awan, Mitko Veta, Korsuk Sirinukunwattana, Gloria Bueno, Peter Hufnagl, Christian Haß, Vassili Kovalev, Vitali Khvatkov, Rengul Cetin-Atalay, Humayun Irshad, Oren Kraus, Qi Dou, Pekka Ruusuvuori, Aditya Khosla, Bharti Mungal, Pheng-Ann Heng, Oscar Geessink, Paul J. van Diest, Shadi Albarqouni, Peter Bult, Yoichi Takenaka, Institut du Cerveau et de la Moëlle Epinière = Brain and Spine Institute (ICM), Institut National de la Santé et de la Recherche Médicale (INSERM)-CHU Pitié-Salpêtrière [AP-HP], Assistance publique - Hôpitaux de Paris (AP-HP) (AP-HP)-Sorbonne Université (SU)-Assistance publique - Hôpitaux de Paris (AP-HP) (AP-HP)-Sorbonne Université (SU)-Sorbonne Université (SU)-Centre National de la Recherche Scientifique (CNRS), Medical Image Analysis, and Discrete Mathematics
IMPORTANCE: Application of deep learning algorithms to whole-slide pathology imagescan potentially improve diagnostic accuracy and efficiency. OBJECTIVE: Assess the performance of automated deep learning algorithms at detecting metastases in hematoxylin and eosin-stained tissue sections of lymph nodes of women with breast cancer and compare it with pathologists' diagnoses in a diagnostic setting. DESIGN, SETTING, AND PARTICIPANTS: Researcher challenge competition (CAMELYON16) to develop automated solutions for detecting lymph node metastases (November 2015-November 2016). A training data set of whole-slide images from 2 centers in the Netherlands with (n = 110) and without (n = 160) nodal metastases verified by immunohistochemical staining were provided to challenge participants to build algorithms. Algorithm performance was evaluated in an independent test set of 129 whole-slide images (49 with and 80 without metastases). The same test set of corresponding glass slides was also evaluated by a panel of 11 pathologists with time constraint (WTC) from the Netherlands to ascertain likelihood of nodal metastases for each slide in a flexible 2-hour session, simulating routine pathology workflow, and by 1 pathologist without time constraint (WOTC). EXPOSURES: Deep learning algorithms submitted as part of a challenge competition or pathologist interpretation. MAIN OUTCOMES AND MEASURES: The presence of specific metastatic foci and the absence vs presence of lymph node metastasis in a slide or image using receiver operating characteristic curve analysis. The 11 pathologists participating in the simulation exercise rated their diagnostic confidence as definitely normal, probably normal, equivocal, probably tumor, or definitely tumor. RESULTS: The area under the receiver operating characteristic curve (AUC) for the algorithms ranged from 0.556 to 0.994. The top-performing algorithm achieved a lesion-level, true-positive fraction comparable with that of the pathologist WOTC (72.4% [95% CI, 64.3%-80.4%]) at a mean of 0.0125 false-positives per normal whole-slide image. For the whole-slide image classification task, the best algorithm (AUC, 0.994 [95% CI, 0.983-0.999]) performed significantly better than the pathologists WTC in a diagnostic simulation (mean AUC, 0.810 [range, 0.738-0.884]; P