Back to Search Start Over

Application of a validated prostate MRI deep learning system to independent same-vendor multi-institutional data: demonstration of transferability.

Authors :
Netzer N
Eith C
Bethge O
Hielscher T
Schwab C
Stenzinger A
Gnirs R
Schlemmer HP
Maier-Hein KH
Schimmöller L
Bonekamp D
Source :
European radiology [Eur Radiol] 2023 Nov; Vol. 33 (11), pp. 7463-7476. Date of Electronic Publication: 2023 Jul 28.
Publication Year :
2023

Abstract

Objectives: To evaluate a fully automatic deep learning system to detect and segment clinically significant prostate cancer (csPCa) on same-vendor prostate MRI from two different institutions not contributing to training of the system.<br />Materials and Methods: In this retrospective study, a previously bi-institutionally validated deep learning system (UNETM) was applied to bi-parametric prostate MRI data from one external institution (A), a PI-RADS distribution-matched internal cohort (B), and a csPCa stratified subset of single-institution external public challenge data (C). csPCa was defined as ISUP Grade Group ≥ 2 determined from combined targeted and extended systematic MRI/transrectal US-fusion biopsy. Performance of UNETM was evaluated by comparing ROC AUC and specificity at typical PI-RADS sensitivity levels. Lesion-level analysis between UNETM segmentations and radiologist-delineated segmentations was performed using Dice coefficient, free-response operating characteristic (FROC), and weighted alternative (waFROC). The influence of using different diffusion sequences was analyzed in cohort A.<br />Results: In 250/250/140 exams in cohorts A/B/C, differences in ROC AUC were insignificant with 0.80 (95% CI: 0.74-0.85)/0.87 (95% CI: 0.83-0.92)/0.82 (95% CI: 0.75-0.89). At sensitivities of 95% and 90%, UNETM achieved specificity of 30%/50% in A, 44%/71% in B, and 43%/49% in C, respectively. Dice coefficient of UNETM and radiologist-delineated lesions was 0.36 in A and 0.49 in B. The waFROC AUC was 0.67 (95% CI: 0.60-0.83) in A and 0.7 (95% CI: 0.64-0.78) in B. UNETM performed marginally better on readout-segmented than on single-shot echo-planar-imaging.<br />Conclusion: For same-vendor examinations, deep learning provided comparable discrimination of csPCa and non-csPCa lesions and examinations between local and two independent external data sets, demonstrating the applicability of the system to institutions not participating in model training.<br />Clinical Relevance Statement: A previously bi-institutionally validated fully automatic deep learning system maintained acceptable exam-level diagnostic performance in two independent external data sets, indicating the potential of deploying AI models without retraining or fine-tuning, and corroborating evidence that AI models extract a substantial amount of transferable domain knowledge about MRI-based prostate cancer assessment.<br />Key Points: • A previously bi-institutionally validated fully automatic deep learning system maintained acceptable exam-level diagnostic performance in two independent external data sets. • Lesion detection performance and segmentation congruence was similar on the institutional and an external data set, as measured by the weighted alternative FROC AUC and Dice coefficient. • Although the system generalized to two external institutions without re-training, achieving expected sensitivity and specificity levels using the deep learning system requires probability thresholds to be adjusted, underlining the importance of institution-specific calibration and quality control.<br /> (© 2023. German Cancer Research Center.)

Details

Language :
English
ISSN :
1432-1084
Volume :
33
Issue :
11
Database :
MEDLINE
Journal :
European radiology
Publication Type :
Academic Journal
Accession number :
37507610
Full Text :
https://doi.org/10.1007/s00330-023-09882-9