Back to Search Start Over

Vision-Based Multimodal Interfaces: A Survey and Taxonomy for Enhanced Context-Aware System Design

Authors :
Hu, Yongquan 'Owen'
Tang, Jingyu
Gong, Xinya
Zhou, Zhongyi
Zhang, Shuning
Elvitigala, Don Samitha
Mueller, Florian 'Floyd'
Hu, Wen
Quigley, Aaron J.
Publication Year :
2025

Abstract

The recent surge in artificial intelligence, particularly in multimodal processing technology, has advanced human-computer interaction, by altering how intelligent systems perceive, understand, and respond to contextual information (i.e., context awareness). Despite such advancements, there is a significant gap in comprehensive reviews examining these advances, especially from a multimodal data perspective, which is crucial for refining system design. This paper addresses a key aspect of this gap by conducting a systematic survey of data modality-driven Vision-based Multimodal Interfaces (VMIs). VMIs are essential for integrating multimodal data, enabling more precise interpretation of user intentions and complex interactions across physical and digital environments. Unlike previous task- or scenario-driven surveys, this study highlights the critical role of the visual modality in processing contextual information and facilitating multimodal interaction. Adopting a design framework moving from the whole to the details and back, it classifies VMIs across dimensions, providing insights for developing effective, context-aware systems.<br />Comment: 31 pages including appendices

Details

Database :
arXiv
Publication Type :
Report
Accession number :
edsarx.2501.13443
Document Type :
Working Paper
Full Text :
https://doi.org/10.1145/3706598.3714161