1. TV-SAM: Increasing Zero-Shot Segmentation Performance on Multimodal Medical Images Using GPT-4 Generated Descriptive Prompts Without Human Annotation
- Author
-
Jiang, Zekun, Cheng, Dongjie, Qin, Ziyuan, Gao, Jun, Lao, Qicheng, Ismoilovich, Abdullaev Bakhrom, Gayrat, Urazboev, Elyorbek, Yuldashov, Habibullo, Bekchanov, Tang, Defu, Wei, LinJing, Li, Kang, and Zhang, Le
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Artificial Intelligence - Abstract
This study presents a novel multimodal medical image zero-shot segmentation algorithm named the text-visual-prompt segment anything model (TV-SAM) without any manual annotations. The TV-SAM incorporates and integrates the large language model GPT-4, the vision language model GLIP, and the SAM to autonomously generate descriptive text prompts and visual bounding box prompts from medical images, thereby enhancing the SAM's capability for zero-shot segmentation. Comprehensive evaluations are implemented on seven public datasets encompassing eight imaging modalities to demonstrate that TV-SAM can effectively segment unseen targets across various modalities without additional training. TV-SAM significantly outperforms SAM AUTO and GSAM, closely matching the performance of SAM BBOX with gold standard bounding box prompts and surpasses the state-of-the-art methods on specific datasets such as ISIC and WBC. The study indicates that TV-SAM serves as an effective multimodal medical image zero-shot segmentation algorithm, highlighting the significant contribution of GPT-4 to zero-shot segmentation. By integrating foundational models such as GPT-4, GLIP, and SAM, the ability to address complex problems in specialized domains can be enhanced., Comment: 13 pages, 5 figures, 4 tables, accepted by BDMA Journal
- Published
- 2024