Back to Search Start Over

Self-supervised endoscopy depth estimation framework with CLIP-guidance segmentation.

Authors :
Yang, Zhuoyue
Pan, Junjun
Dai, Ju
Sun, Zhen
Xiao, Yi
Source :
Biomedical Signal Processing & Control; Sep2024:Part A, Vol. 95, pN.PAG-N.PAG, 1p
Publication Year :
2024

Abstract

Depth estimation has very broad potential in medical image analysis and is important for applications such as augmented reality surgical navigation and preoperative planning. Compared with segmentation tasks that can obtain ground truth through manual annotation, it is difficult to obtain a large number of real values for depth estimation tasks that are limited by hardware conditions in endoscopic environments. To address the challenge, we propose a novel framework that utilizes segmentation tasks to improve encoder performance in a self-supervised depth estimation network. For the first time, we leverage the Contrastive Language-Image Pre-training (CLIP) method to improve the performance of endoscopy segmentation models. Depth estimation networks can also benefit from this training process indirectly. In addition, we design a semantic-guidance loss function to improve the performance. Our proposed method is systematically evaluated on three datasets. Experiments have verified that the proposed framework can assist the network model in obtaining smaller errors. Compared with other state-of-the-art methods, our framework obtains 0.081 and 0.097 on absolute relative error metrics in quantitative evaluations on SCARED and SERV-CT datasets respectively. In qualitative experiments on real surgery datasets, our proposed method also shows more ideal results. The experiments in this study illustrate that our proposed method can alleviate the problem of difficulty in improving network performance due to the lack of real values of depth data. The visual performance of our approach illustrates the application potential in the clinic. Our method helps doctors obtain depth perception and visual cues simultaneously, thereby reducing the difficulty of surgery and the pain of patients. • A framework combining endoscopic depth estimation and segmentation is proposed. • CLIP strategy is applied to endoscopic image segmentation tasks for the first time. • A loss function performs domain smoothing for different physiological structures. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
17468094
Volume :
95
Database :
Supplemental Index
Journal :
Biomedical Signal Processing & Control
Publication Type :
Academic Journal
Accession number :
177846909
Full Text :
https://doi.org/10.1016/j.bspc.2024.106410