Author: "Wang, Guankun" / Topic: computer science - computer vision and pattern recognition - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Wang, Guankun"' showing total 10 results

Start Over Author "Wang, Guankun" Topic computer science - computer vision and pattern recognition

10 results on '"Wang, Guankun"'

1. CoPESD: A Multi-Level Surgical Motion Dataset for Training Large Vision-Language Models to Co-Pilot Endoscopic Submucosal Dissection

Author: Wang, Guankun, Xiao, Han, Gao, Huxin, Zhang, Renrui, Bai, Long, Yang, Xiaoxiao, Li, Zhen, Li, Hongsheng, and Ren, Hongliang
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: submucosal dissection (ESD) enables rapid resection of large lesions, minimizing recurrence rates and improving long-term overall survival. Despite these advantages, ESD is technically challenging and carries high risks of complications, necessitating skilled surgeons and precise instruments. Recent advancements in Large Visual-Language Models (LVLMs) offer promising decision support and predictive planning capabilities for robotic systems, which can augment the accuracy of ESD and reduce procedural risks. However, existing datasets for multi-level fine-grained ESD surgical motion understanding are scarce and lack detailed annotations. In this paper, we design a hierarchical decomposition of ESD motion granularity and introduce a multi-level surgical motion dataset (CoPESD) for training LVLMs as the robotic \textbf{Co}-\textbf{P}ilot of \textbf{E}ndoscopic \textbf{S}ubmucosal \textbf{D}issection. CoPESD includes 17,679 images with 32,699 bounding boxes and 88,395 multi-level motions, from over 35 hours of ESD videos for both robot-assisted and conventional surgeries. CoPESD enables granular analysis of ESD motions, focusing on the complex task of submucosal dissection. Extensive experiments on the LVLMs demonstrate the effectiveness of CoPESD in training LVLMs to predict following surgical robotic motions. As the first multimodal ESD motion dataset, CoPESD supports advanced research in ESD instruction-following and surgical automation. The dataset is available at \href{https://github.com/gkw0010/CoPESD}{https://github.com/gkw0010/CoPESD.}}
Published: 2024

2. Surgical-VQLA++: Adversarial Contrastive Learning for Calibrated Robust Visual Question-Localized Answering in Robotic Surgery

Author: Bai, Long, Wang, Guankun, Islam, Mobarakol, Seenivasan, Lalithkumar, Wang, An, and Ren, Hongliang
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Robotics
Abstract: Medical visual question answering (VQA) bridges the gap between visual information and clinical decision-making, enabling doctors to extract understanding from clinical images and videos. In particular, surgical VQA can enhance the interpretation of surgical data, aiding in accurate diagnoses, effective education, and clinical interventions. However, the inability of VQA models to visually indicate the regions of interest corresponding to the given questions results in incomplete comprehension of the surgical scene. To tackle this, we propose the surgical visual question localized-answering (VQLA) for precise and context-aware responses to specific queries regarding surgical images. Furthermore, to address the strong demand for safety in surgical scenarios and potential corruptions in image acquisition and transmission, we propose a novel approach called Calibrated Co-Attention Gated Vision-Language (C$^2$G-ViL) embedding to integrate and align multimodal information effectively. Additionally, we leverage the adversarial sample-based contrastive learning strategy to boost our performance and robustness. We also extend our EndoVis-18-VQLA and EndoVis-17-VQLA datasets to broaden the scope and application of our data. Extensive experiments on the aforementioned datasets demonstrate the remarkable performance and robustness of our solution. Our solution can effectively combat real-world image corruption. Thus, our proposed approach can serve as an effective tool for assisting surgical education, patient care, and enhancing surgical outcomes., Comment: Accepted by Information Fusion. Code and data availability: https://github.com/longbai1006/Surgical-VQLAPlus
Published: 2024

3. Surgical-LVLM: Learning to Adapt Large Vision-Language Model for Grounded Visual Question Answering in Robotic Surgery

Author: Wang, Guankun, Bai, Long, Nah, Wan Jun, Wang, Jie, Zhang, Zhaoxi, Chen, Zhen, Wu, Jinlin, Islam, Mobarakol, Liu, Hongbin, and Ren, Hongliang
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence, Computer Science - Robotics, Electrical Engineering and Systems Science - Image and Video Processing
Abstract: Recent advancements in Surgical Visual Question Answering (Surgical-VQA) and related region grounding have shown great promise for robotic and medical applications, addressing the critical need for automated methods in personalized surgical mentorship. However, existing models primarily provide simple structured answers and struggle with complex scenarios due to their limited capability in recognizing long-range dependencies and aligning multimodal information. In this paper, we introduce Surgical-LVLM, a novel personalized large vision-language model tailored for complex surgical scenarios. Leveraging the pre-trained large vision-language model and specialized Visual Perception LoRA (VP-LoRA) blocks, our model excels in understanding complex visual-language tasks within surgical contexts. In addressing the visual grounding task, we propose the Token-Interaction (TIT) module, which strengthens the interaction between the grounding module and the language responses of the Large Visual Language Model (LVLM) after projecting them into the latent space. We demonstrate the effectiveness of Surgical-LVLM on several benchmarks, including EndoVis-17-VQLA, EndoVis-18-VQLA, and a newly introduced EndoVis Conversations dataset, which sets new performance standards. Our work contributes to advancing the field of automated surgical mentorship by providing a context-aware solution.
Published: 2024

4. EndoOOD: Uncertainty-aware Out-of-distribution Detection in Capsule Endoscopy Diagnosis

Author: Tan, Qiaozhi, Bai, Long, Wang, Guankun, Islam, Mobarakol, and Ren, Hongliang
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Wireless capsule endoscopy (WCE) is a non-invasive diagnostic procedure that enables visualization of the gastrointestinal (GI) tract. Deep learning-based methods have shown effectiveness in disease screening using WCE data, alleviating the burden on healthcare professionals. However, existing capsule endoscopy classification methods mostly rely on pre-defined categories, making it challenging to identify and classify out-of-distribution (OOD) data, such as undefined categories or anatomical landmarks. To address this issue, we propose the Endoscopy Out-of-Distribution (EndoOOD) framework, which aims to effectively handle the OOD detection challenge in WCE diagnosis. The proposed framework focuses on improving the robustness and reliability of WCE diagnostic capabilities by incorporating uncertainty-aware mixup training and long-tailed in-distribution (ID) data calibration techniques. Additionally, virtual-logit matching is employed to accurately distinguish between OOD and ID data while minimizing information loss. To assess the performance of our proposed solution, we conduct evaluations and comparisons with 12 state-of-the-art (SOTA) methods using two publicly available datasets. The results demonstrate the effectiveness of the proposed framework in enhancing diagnostic accuracy and supporting clinical decision-making., Comment: To appear in IEEE ISBI 2024
Published: 2024

5. OSSAR: Towards Open-Set Surgical Activity Recognition in Robot-assisted Surgery

Author: Bai, Long, Wang, Guankun, Wang, Jie, Yang, Xiaoxiao, Gao, Huxin, Liang, Xin, Wang, An, Islam, Mobarakol, and Ren, Hongliang
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence, Computer Science - Robotics
Abstract: In the realm of automated robotic surgery and computer-assisted interventions, understanding robotic surgical activities stands paramount. Existing algorithms dedicated to surgical activity recognition predominantly cater to pre-defined closed-set paradigms, ignoring the challenges of real-world open-set scenarios. Such algorithms often falter in the presence of test samples originating from classes unseen during training phases. To tackle this problem, we introduce an innovative Open-Set Surgical Activity Recognition (OSSAR) framework. Our solution leverages the hyperspherical reciprocal point strategy to enhance the distinction between known and unknown classes in the feature space. Additionally, we address the issue of over-confidence in the closed set by refining model calibration, avoiding misclassification of unknown classes as known ones. To support our assertions, we establish an open-set surgical activity benchmark utilizing the public JIGSAWS dataset. Besides, we also collect a novel dataset on endoscopic submucosal dissection for surgical activity tasks. Extensive comparisons and ablation experiments on these datasets demonstrate the significant outperformance of our method over existing state-of-the-art approaches. Our proposed solution can effectively address the challenges of real-world surgical scenarios. Our code is publicly accessible at https://github.com/longbai1006/OSSAR., Comment: To appear in IEEE ICRA 2024
Published: 2024

6. Perception Reinforcement Using Auxiliary Learning Feature Fusion: A Modified Yolov8 for Head Detection

Author: Chen, Jiezhou, Wang, Guankun, Liu, Weixiang, Zhong, Xiaopin, Tian, Yibin, and Wu, ZongZe
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Head detection provides distribution information of pedestrian, which is crucial for scene statistical analysis, traffic management, and risk assessment and early warning. However, scene complexity and large-scale variation in the real world make accurate detection more difficult. Therefore, we present a modified Yolov8 which improves head detection performance through reinforcing target perception. An Auxiliary Learning Feature Fusion (ALFF) module comprised of LSTM and convolutional blocks is used as the auxiliary task to help the model perceive targets. In addition, we introduce Noise Calibration into Distribution Focal Loss to facilitate model fitting and improve the accuracy of detection. Considering the requirements of high accuracy and speed for the head detection task, our method is adapted with two kinds of backbone, namely Yolov8n and Yolov8m. The results demonstrate the superior performance of our approach in improving detection accuracy and robustness.
Published: 2023

7. Rethinking Exemplars for Continual Semantic Segmentation in Endoscopy Scenes: Entropy-based Mini-Batch Pseudo-Replay

Author: Wang, Guankun, Bai, Long, Wu, Yanan, Chen, Tong, and Ren, Hongliang
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence
Abstract: Endoscopy is a widely used technique for the early detection of diseases or robotic-assisted minimally invasive surgery (RMIS). Numerous deep learning (DL)-based research works have been developed for automated diagnosis or processing of endoscopic view. However, existing DL models may suffer from catastrophic forgetting. When new target classes are introduced over time or cross institutions, the performance of old classes may suffer severe degradation. More seriously, data privacy and storage issues may lead to the unavailability of old data when updating the model. Therefore, it is necessary to develop a continual learning (CL) methodology to solve the problem of catastrophic forgetting in endoscopic image segmentation. To tackle this, we propose a Endoscopy Continual Semantic Segmentation (EndoCSS) framework that does not involve the storage and privacy issues of exemplar data. The framework includes a mini-batch pseudo-replay (MB-PR) mechanism and a self-adaptive noisy cross-entropy (SAN-CE) loss. The MB-PR strategy circumvents privacy and storage issues by generating pseudo-replay images through a generative model. Meanwhile, the MB-PR strategy can also correct the model deviation to the replay data and current training data, which is aroused by the significant difference in the amount of current and replay images. Therefore, the model can perform effective representation learning on both new and old tasks. SAN-CE loss can help model fitting by adjusting the model's output logits, and also improve the robustness of training. Extensive continual semantic segmentation (CSS) experiments on public datasets demonstrate that our method can robustly and effectively address the catastrophic forgetting brought by class increment in endoscopy scenes. The results show that our framework holds excellent potential for real-world deployment in a streaming learning manner., Comment: Accepted by Computers in Biology and Medicine
Published: 2023

8. Domain Adaptive Sim-to-Real Segmentation of Oropharyngeal Organs Towards Robot-assisted Intubation

Author: Wang, Guankun, Ren, Tian-Ao, Lai, Jiewen, Bai, Long, and Ren, Hongliang
Subjects: Electrical Engineering and Systems Science - Image and Video Processing, Computer Science - Computer Vision and Pattern Recognition, Computer Science - Robotics
Abstract: Robotic-assisted tracheal intubation requires the robot to distinguish anatomical features like an experienced physician using deep-learning techniques. However, real datasets of oropharyngeal organs are limited due to patient privacy issues, making it challenging to train deep-learning models for accurate image segmentation. We hereby consider generating a new data modality through a virtual environment to assist the training process. Specifically, this work introduces a virtual dataset generated by the Simulation Open Framework Architecture (SOFA) framework to overcome the limited availability of actual endoscopic images. We also propose a domain adaptive Sim-to-Real method for oropharyngeal organ image segmentation, which employs an image blending strategy called IoU-Ranking Blend (IRB) and style-transfer techniques to address discrepancies between datasets. Experimental results demonstrate the superior performance of the proposed approach with domain adaptive models, improving segmentation accuracy and training stability. In the practical application, the trained segmentation model holds great promise for robot-assisted intubation surgery and intelligent surgical navigation., Comment: Extended abstract in IEEE ICRA 2023 Workshop (New Evolutions in Surgical Robotics: Embracing Multimodal Imaging Guidance, Intelligence, and Bio-inspired Mechanisms). arXiv admin note: text overlap with arXiv:2305.10883
Published: 2023

9. Domain Adaptive Sim-to-Real Segmentation of Oropharyngeal Organs

Author: Wang, Guankun, Ren, Tian-Ao, Lai, Jiewen, Bai, Long, and Ren, Hongliang
Subjects: Computer Science - Artificial Intelligence, Computer Science - Computer Vision and Pattern Recognition, Electrical Engineering and Systems Science - Image and Video Processing
Abstract: Video-assisted transoral tracheal intubation (TI) necessitates using an endoscope that helps the physician insert a tracheal tube into the glottis instead of the esophagus. The growing trend of robotic-assisted TI would require a medical robot to distinguish anatomical features like an experienced physician which can be imitated by utilizing supervised deep-learning techniques. However, the real datasets of oropharyngeal organs are often inaccessible due to limited open-source data and patient privacy. In this work, we propose a domain adaptive Sim-to-Real framework called IoU-Ranking Blend-ArtFlow (IRB-AF) for image segmentation of oropharyngeal organs. The framework includes an image blending strategy called IoU-Ranking Blend (IRB) and style-transfer method ArtFlow. Here, IRB alleviates the problem of poor segmentation performance caused by significant datasets domain differences; while ArtFlow is introduced to reduce the discrepancies between datasets further. A virtual oropharynx image dataset generated by the SOFA framework is used as the learning subject for semantic segmentation to deal with the limited availability of actual endoscopic images. We adapted IRB-AF with the state-of-the-art domain adaptive segmentation models. The results demonstrate the superior performance of our approach in further improving the segmentation accuracy and training stability., Comment: The manuscript is accepted by Medical & Biological Engineering & Computing. Code and dataset: https://github.com/gkw0010/EISOST-Sim2Real-Dataset-Release
Published: 2023

10. Mask Focal Loss: A unifying framework for dense crowd counting with canonical object detection networks

Author: Zhong, Xiaopin, Wang, Guankun, Liu, Weixiang, Wu, Zongze, and Deng, Yuanlong
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: As a fundamental computer vision task, crowd counting plays an important role in public safety. Currently, deep learning based head detection is a promising method for crowd counting. However, the highly concerned object detection networks cannot be well applied to this problem for three reasons: (1) Existing loss functions fail to address sample imbalance in highly dense and complex scenes; (2) Canonical object detectors lack spatial coherence in loss calculation, disregarding the relationship between object location and background region; (3) Most of the head detection datasets are only annotated with the center points, i.e. without bounding boxes. To overcome these issues, we propose a novel Mask Focal Loss (MFL) based on heatmap via the Gaussian kernel. MFL provides a unifying framework for the loss functions based on both heatmap and binary feature map ground truths. Additionally, we introduce GTA_Head, a synthetic dataset with comprehensive annotations, for evaluation and comparison. Extensive experimental results demonstrate the superior performance of our MFL across various detectors and datasets, and it can reduce MAE and RMSE by up to 47.03% and 61.99%, respectively. Therefore, our work presents a strong foundation for advancing crowd counting methods based on density estimation., Comment: The manuscript is accepted by Multimedia Tools and Applications
Published: 2022

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

10 results on '"Wang, Guankun"'

1. CoPESD: A Multi-Level Surgical Motion Dataset for Training Large Vision-Language Models to Co-Pilot Endoscopic Submucosal Dissection

2. Surgical-VQLA++: Adversarial Contrastive Learning for Calibrated Robust Visual Question-Localized Answering in Robotic Surgery

3. Surgical-LVLM: Learning to Adapt Large Vision-Language Model for Grounded Visual Question Answering in Robotic Surgery

4. EndoOOD: Uncertainty-aware Out-of-distribution Detection in Capsule Endoscopy Diagnosis

5. OSSAR: Towards Open-Set Surgical Activity Recognition in Robot-assisted Surgery

6. Perception Reinforcement Using Auxiliary Learning Feature Fusion: A Modified Yolov8 for Head Detection

7. Rethinking Exemplars for Continual Semantic Segmentation in Endoscopy Scenes: Entropy-based Mini-Batch Pseudo-Replay

8. Domain Adaptive Sim-to-Real Segmentation of Oropharyngeal Organs Towards Robot-assisted Intubation

9. Domain Adaptive Sim-to-Real Segmentation of Oropharyngeal Organs

10. Mask Focal Loss: A unifying framework for dense crowd counting with canonical object detection networks

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Database

10 results on '"Wang, Guankun"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources