Back to Search
Start Over
Multimodal Food Image Classification with Large Language Models.
- Source :
- Electronics (2079-9292); Nov2024, Vol. 13 Issue 22, p4552, 10p
- Publication Year :
- 2024
-
Abstract
- In this study, we leverage advancements in large language models (LLMs) for fine-grained food image classification. We achieve this by integrating textual features extracted from images using an LLM into a multimodal learning framework. Specifically, semantic textual descriptions generated by the LLM are encoded and combined with image features obtained from a transformer-based architecture to improve food image classification. Our approach employs a cross-attention mechanism to effectively fuse visual and textual modalities, enhancing the model's ability to extract discriminative features beyond what can be achieved with visual features alone. [ABSTRACT FROM AUTHOR]
Details
- Language :
- English
- ISSN :
- 20799292
- Volume :
- 13
- Issue :
- 22
- Database :
- Complementary Index
- Journal :
- Electronics (2079-9292)
- Publication Type :
- Academic Journal
- Accession number :
- 181168373
- Full Text :
- https://doi.org/10.3390/electronics13224552