Author: "Zhang, Xiaoman" / Topic: computer vision and pattern recognition (cs.cv) - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Zhang, Xiaoman"' showing total 10 results

Start Over Author "Zhang, Xiaoman" Topic computer vision and pattern recognition (cs.cv)

10 results on '"Zhang, Xiaoman"'

1. PMC-VQA: Visual Instruction Tuning for Medical Visual Question Answering

Author: Zhang, Xiaoman, Wu, Chaoyi, Zhao, Ziheng, Lin, Weixiong, Zhang, Ya, Wang, Yanfeng, and Xie, Weidi
Subjects: FOS: Computer and information sciences, Computer Vision and Pattern Recognition (cs.CV), Computer Science - Computer Vision and Pattern Recognition
Abstract: In this paper, we focus on the problem of Medical Visual Question Answering (MedVQA), which is crucial in efficiently interpreting medical images with vital clinic-relevant information. Firstly, we reframe the problem of MedVQA as a generation task that naturally follows the human-machine interaction, we propose a generative-based model for medical visual understanding by aligning visual information from a pre-trained vision encoder with a large language model. Secondly, we establish a scalable pipeline to construct a large-scale medical visual question-answering dataset, named PMC-VQA, which contains 227k VQA pairs of 149k images that cover various modalities or diseases. Thirdly, we pre-train our proposed model on PMC-VQA and then fine-tune it on multiple public benchmarks, e.g., VQA-RAD and SLAKE, outperforming existing work by a large margin. Additionally, we propose a test set that has undergone manual verification, which is significantly more challenging, even the best models struggle to solve.
Published: 2023

2. K-Diag: Knowledge-enhanced Disease Diagnosis in Radiographic Imaging

Author: Wu, Chaoyi, Zhang, Xiaoman, Wang, Yanfeng, Zhang, Ya, and Xie, Weidi
Subjects: FOS: Computer and information sciences, Computer Vision and Pattern Recognition (cs.CV), Computer Science - Computer Vision and Pattern Recognition
Abstract: In this paper, we consider the problem of disease diagnosis. Unlike the conventional learning paradigm that treats labels independently, we propose a knowledge-enhanced framework, that enables training visual representation with the guidance of medical domain knowledge. In particular, we make the following contributions: First, to explicitly incorporate experts' knowledge, we propose to learn a neural representation for the medical knowledge graph via contrastive learning, implicitly establishing relations between different medical concepts. Second, while training the visual encoder, we keep the parameters of the knowledge encoder frozen and propose to learn a set of prompt vectors for efficient adaptation. Third, we adopt a Transformer-based disease-query module for cross-model fusion, which naturally enables explainable diagnosis results via cross attention. To validate the effectiveness of our proposed framework, we conduct thorough experiments on three x-ray imaging datasets across different anatomy structures, showing our model is able to exploit the implicit relations between diseases/findings, thus is beneficial to the commonly encountered problem in the medical domain, namely, long-tailed and zero-shot recognition, which conventional methods either struggle or completely fail to realize.
Published: 2023

3. MedKLIP: Medical Knowledge Enhanced Language-Image Pre-Training in Radiology

Author: Wu, Chaoyi, Zhang, Xiaoman, Zhang, Ya, Wang, Yanfeng, and Xie, Weidi
Subjects: FOS: Computer and information sciences, Computer Science - Computation and Language, Computer Vision and Pattern Recognition (cs.CV), Image and Video Processing (eess.IV), FOS: Electrical engineering, electronic engineering, information engineering, Computer Science - Computer Vision and Pattern Recognition, Electrical Engineering and Systems Science - Image and Video Processing, Computation and Language (cs.CL)
Abstract: In this paper, we consider enhancing medical visual-language pre-training (VLP) with domain-specific knowledge, by exploiting the paired image-text reports from the radiological daily practice. In particular, we make the following contributions: First, unlike existing works that directly process the raw reports, we adopt a novel triplet extraction module to extract the medical-related information, avoiding unnecessary complexity from language grammar and enhancing the supervision signals; Second, we propose a novel triplet encoding module with entity translation by querying a knowledge base, to exploit the rich domain knowledge in medical field, and implicitly build relationships between medical entities in the language embedding space; Third, we propose to use a Transformer-based fusion model for spatially aligning the entity description with visual signals at the image patch level, enabling the ability for medical diagnosis; Fourth, we conduct thorough experiments to validate the effectiveness of our architecture, and benchmark on numerous public benchmarks, e.g., ChestX-ray14, RSNA Pneumonia, SIIM-ACR Pneumothorax, COVIDx CXR-2, COVID Rural, and EdemaSeverity. In both zero-shot and fine-tuning settings, our model has demonstrated strong performance compared with the former methods on disease classification and grounding.
Published: 2023

4. Knowledge-enhanced Visual-Language Pre-training on Chest Radiology Images

Author: Zhang, Xiaoman, Wu, Chaoyi, Zhang, Ya, Wang, Yanfeng, and Xie, Weidi
Subjects: FOS: Computer and information sciences, Computer Vision and Pattern Recognition (cs.CV), Computer Science - Computer Vision and Pattern Recognition
Abstract: While multi-modal foundation models pre-trained on large-scale data have been successful in natural language understanding and vision recognition, their use in medical domains is still limited due to the fine-grained nature of medical tasks and the high demand for domain knowledge. To address this challenge, we propose a novel approach called Knowledge-enhanced Auto Diagnosis (KAD) which leverages existing medical domain knowledge to guide vision-language pre-training using paired chest X-rays and radiology reports. We evaluate KAD on {four} external X-ray datasets and demonstrate that its zero-shot performance is not only comparable to that of fully-supervised models, but also superior to the average of three expert radiologists for three (out of five) pathologies with statistical significance. Moreover, when few-shot annotation is available, KAD outperforms all existing approaches in fine-tuning settings, demonstrating its potential for application in different clinical scenarios.
Published: 2023
Full Text: View/download PDF

5. PMC-CLIP: Contrastive Language-Image Pre-training using Biomedical Documents

Author: Lin, Weixiong, Zhao, Ziheng, Zhang, Xiaoman, Wu, Chaoyi, Zhang, Ya, Wang, Yanfeng, and Xie, Weidi
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Computer Science - Computation and Language, Computer Vision and Pattern Recognition (cs.CV), Computer Science - Computer Vision and Pattern Recognition, Computation and Language (cs.CL), Computer Science - Multimedia, Machine Learning (cs.LG), Multimedia (cs.MM)
Abstract: Foundation models trained on large-scale dataset gain a recent surge in CV and NLP. In contrast, development in biomedical domain lags far behind due to data scarcity. To address this issue, we build and release PMC-OA, a biomedical dataset with 1.6M image-caption pairs collected from PubMedCentral's OpenAccess subset, which is 8 times larger than before. PMC-OA covers diverse modalities or diseases, with majority of the image-caption samples aligned at finer-grained level, i.e., subfigure and subcaption. While pretraining a CLIP-style model on PMC-OA, our model named PMC-CLIP achieves state-of-the-art results on various downstream tasks, including image-text retrieval on ROCO, MedMNIST image classification, Medical VQA, i.e. +8.1% R@10 on image-text retrieval, +3.9% accuracy on image classification., Comment: 10 pages, 3 figures
Published: 2023
Full Text: View/download PDF

6. MS-KD: Multi-Organ Segmentation with Multiple Binary-Labeled Datasets

Author: Feng, Shixiang, Zhou, Yuhang, Zhang, Xiaoman, Zhang, Ya, and Wang, Yanfeng
Subjects: FOS: Computer and information sciences, Computer Vision and Pattern Recognition (cs.CV), Computer Science - Computer Vision and Pattern Recognition
Abstract: Annotating multiple organs in 3D medical images is time-consuming and costly. Meanwhile, there exist many single-organ datasets with one specific organ annotated. This paper investigates how to learn a multi-organ segmentation model leveraging a set of binary-labeled datasets. A novel Multi-teacher Single-student Knowledge Distillation (MS-KD) framework is proposed, where the teacher models are pre-trained single-organ segmentation networks, and the student model is a multi-organ segmentation network. Considering that each teacher focuses on different organs, a region-based supervision method, consisting of logits-wise supervision and feature-wise supervision, is proposed. Each teacher supervises the student in two regions, the organ region where the teacher is considered as an expert and the background region where all teachers agree. Extensive experiments on three public single-organ datasets and a multi-organ dataset have demonstrated the effectiveness of the proposed MS-KD framework.
Published: 2021

7. Uncertainty-aware Incremental Learning for Multi-organ Segmentation

Author: Zhou, Yuhang, Zhang, Xiaoman, Feng, Shixiang, Zhang, Ya, and Yanfeng
Subjects: FOS: Computer and information sciences, Computer Vision and Pattern Recognition (cs.CV), Computer Science - Computer Vision and Pattern Recognition
Abstract: Most existing approaches to train a unified multi-organ segmentation model from several single-organ datasets require simultaneously access multiple datasets during training. In the real scenarios, due to privacy and ethics concerns, the training data of the organs of interest may not be publicly available. To this end, we investigate a data-free incremental organ segmentation scenario and propose a novel incremental training framework to solve it. We use the pretrained model instead of its own training data for privacy protection. Specifically, given a pretrained $K$ organ segmentation model and a new single-organ dataset, we train a unified $K+1$ organ segmentation model without accessing any data belonging to the previous training stages. Our approach consists of two parts: the background label alignment strategy and the uncertainty-aware guidance strategy. The first part is used for knowledge transfer from the pretained model to the training model. The second part is used to extract the uncertainty information from the pretrained model to guide the whole knowledge transfer process. By combing these two strategies, more reliable information is extracted from the pretrained model without original training data. Experiments on multiple publicly available pretrained models and a multi-organ dataset MOBA have demonstrated the effectiveness of our framework.
Published: 2021

8. Self-supervised Tumor Segmentation through Layer Decomposition

Author: Zhang, Xiaoman, Xie, Weidi, Huang, Chaoqin, Wang, Yanfeng, Zhang, Ya, Chen, Xin, and Tian, Qi
Subjects: FOS: Computer and information sciences, ComputingMethodologies_PATTERNRECOGNITION, Computer Vision and Pattern Recognition (cs.CV), Computer Science - Computer Vision and Pattern Recognition, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION
Abstract: In this paper, we target self-supervised representation learning for zero-shot tumor segmentation. We make the following contributions: First, we advocate a zero-shot setting, where models from pre-training should be directly applicable for the downstream task, without using any manual annotations. Second, we take inspiration from "layer-decomposition", and innovate on the training regime with simulated tumor data. Third, we conduct extensive ablation studies to analyse the critical components in data simulation, and validate the necessity of different proxy tasks. We demonstrate that, with sufficient texture randomization in simulation, model trained on synthetic data can effortlessly generalise to segment real tumor data. Forth, our approach achieves superior results for zero-shot tumor segmentation on different downstream datasets, BraTS2018 for brain tumor segmentation and LiTS2017 for liver tumor segmentation. While evaluating the model transferability for tumor segmentation under a low-annotation regime, the proposed approach also outperforms all existing self-supervised approaches, opening up the usage of self-supervised learning in practical scenarios., Comment: Project webpage: https://xiaoman-zhang.github.io/Layer-Decomposition/
Published: 2021
Full Text: View/download PDF

9. SAR: Scale-Aware Restoration Learning for 3D Tumor Segmentation

Author: Zhang, Xiaoman, Feng, Shixiang, Zhou, Yuhang, Zhang, Ya, and Wang, Yanfeng
Subjects: FOS: Computer and information sciences, Computer Vision and Pattern Recognition (cs.CV), Computer Science - Computer Vision and Pattern Recognition
Abstract: Automatic and accurate tumor segmentation on medical images is in high demand to assist physicians with diagnosis and treatment. However, it is difficult to obtain massive amounts of annotated training data required by the deep-learning models as the manual delineation process is often tedious and expertise required. Although self-supervised learning (SSL) scheme has been widely adopted to address this problem, most SSL methods focus only on global structure information, ignoring the key distinguishing features of tumor regions: local intensity variation and large size distribution. In this paper, we propose Scale-Aware Restoration (SAR), a SSL method for 3D tumor segmentation. Specifically, a novel proxy task, i.e. scale discrimination, is formulated to pre-train the 3D neural network combined with the self-restoration task. Thus, the pre-trained model learns multi-level local representations through multi-scale inputs. Moreover, an adversarial learning module is further introduced to learn modality invariant representations from multiple unlabeled source datasets. We demonstrate the effectiveness of our methods on two downstream tasks: i) Brain tumor segmentation, ii) Pancreas tumor segmentation. Compared with the state-of-the-art 3D SSL methods, our proposed approach can significantly improve the segmentation accuracy. Besides, we analyze its advantages from multiple perspectives such as data efficiency, performance, and convergence speed., Comment: Accepted by MICCAI 2021
Published: 2020
Full Text: View/download PDF

10. A Deep Framework for Bone Age Assessment based on Finger Joint Localization

Author: Zhang, Xiaoman, Zhao, Ziyuan, Chen, Cen, Peng, Songyou, Wu, Min, Cheng, Zhongyao, Teo, Singee, Zhang, Le, and Zeng, Zeng
Subjects: FOS: Computer and information sciences, Computer Vision and Pattern Recognition (cs.CV), Computer Science - Computer Vision and Pattern Recognition
Abstract: Bone age assessment is an important clinical trial to measure skeletal child maturity and diagnose of growth disorders. Conventional approaches such as the Tanner-Whitehouse (TW) and Greulich and Pyle (GP) may not perform well due to their large inter-observer and intra-observer variations. In this paper, we propose a finger joint localization strategy to filter out most non-informative parts of images. When combining with the conventional full image-based deep network, we observe a much-improved performance. % Our approach utilizes full hand and specific joints images for skeletal maturity prediction. In this study, we applied powerful deep neural network and explored a process in the forecast of skeletal bone age with the specifically combine joints images to increase the performance accuracy compared with the whole hand images., Comment: Some changes will be made
Published: 2019
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

10 results on '"Zhang, Xiaoman"'

1. PMC-VQA: Visual Instruction Tuning for Medical Visual Question Answering

2. K-Diag: Knowledge-enhanced Disease Diagnosis in Radiographic Imaging

3. MedKLIP: Medical Knowledge Enhanced Language-Image Pre-Training in Radiology

4. Knowledge-enhanced Visual-Language Pre-training on Chest Radiology Images

5. PMC-CLIP: Contrastive Language-Image Pre-training using Biomedical Documents

6. MS-KD: Multi-Organ Segmentation with Multiple Binary-Labeled Datasets

7. Uncertainty-aware Incremental Learning for Multi-organ Segmentation

8. Self-supervised Tumor Segmentation through Layer Decomposition

9. SAR: Scale-Aware Restoration Learning for 3D Tumor Segmentation

10. A Deep Framework for Bone Age Assessment based on Finger Joint Localization

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Database

10 results on '"Zhang, Xiaoman"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources