Start Over

ConTEXTual Net: A Multimodal Vision-Language Model for Segmentation of Pneumothorax.

Authors :: Huemann Z
Tie X
Hu J
Bradshaw TJ
Source :: Journal of imaging informatics in medicine [J Imaging Inform Med] 2024 Aug; Vol. 37 (4), pp. 1652-1663. Date of Electronic Publication: 2024 Mar 14.
Publication Year :: 2024
Abstract: Radiology narrative reports often describe characteristics of a patient's disease, including its location, size, and shape. Motivated by the recent success of multimodal learning, we hypothesized that this descriptive text could guide medical image analysis algorithms. We proposed a novel vision-language model, ConTEXTual Net, for the task of pneumothorax segmentation on chest radiographs. ConTEXTual Net extracts language features from physician-generated free-form radiology reports using a pre-trained language model. We then introduced cross-attention between the language features and the intermediate embeddings of an encoder-decoder convolutional neural network to enable language guidance for image analysis. ConTEXTual Net was trained on the CANDID-PTX dataset consisting of 3196 positive cases of pneumothorax with segmentation annotations from 6 different physicians as well as clinical radiology reports. Using cross-validation, ConTEXTual Net achieved a Dice score of 0.716±0.016, which was similar to the degree of inter-reader variability (0.712±0.044) computed on a subset of the data. It outperformed vision-only models (Swin UNETR: 0.670±0.015, ResNet50 U-Net: 0.677±0.015, GLoRIA: 0.686±0.014, and nnUNet 0.694±0.016) and a competing vision-language model (LAVT: 0.706±0.009). Ablation studies confirmed that it was the text information that led to the performance gains. Additionally, we show that certain augmentation methods degraded ConTEXTual Net's segmentation performance by breaking the image-text concordance. We also evaluated the effects of using different language models and activation functions in the cross-attention module, highlighting the efficacy of our chosen architectural design.<br /> (© 2024. The Author(s) under exclusive licence to Society for Imaging Informatics in Medicine.)

Subjects :: Humans
Algorithms
Radiography, Thoracic
Natural Language Processing
Pneumothorax diagnostic imaging
Neural Networks, Computer

Details

Language :: English
ISSN :: 2948-2933
Volume :: 37
Issue :: 4
Database :: MEDLINE
Journal :: Journal of imaging informatics in medicine
Publication Type :: Academic Journal
Accession number :: 38485899
Full Text :: https://doi.org/10.1007/s10278-024-01051-8

Full Text Access

View/download PDF

Tools

Email
Cite

Printer

Authors Abstract Subjects Details

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

ConTEXTual Net: A Multimodal Vision-Language Model for Segmentation of Pneumothorax.

Abstract

Subjects

Details

Tools

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

ConTEXTual Net: A Multimodal Vision-Language Model for Segmentation of Pneumothorax.

Abstract

Subjects

Details

Tools

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources