Audio-visual saliency prediction for movie viewing in immersive environments: Dataset and benchmarks.

Authors :: Chen, Zhao
Zhang, Kao
Cai, Hao
Ding, Xiaoying
Jiang, Chenxi
Chen, Zhenzhong
Source :: Journal of Visual Communication & Image Representation. Apr2024, Vol. 100, pN.PAG-N.PAG. 1p.
Publication Year :: 2024
Abstract: In this paper, an eye-tracking dataset of movie viewing in the immersive environment is developed, which contains 256 movie clips with 2K QHD resolution and corresponding movie genre labels from IMDb (Internet Movie Database). The dataset provides the audio-visual clues for studying the human visual attention when watching movie using a VR headset, by recording the eye movements using integrated eye tracker. To provide benchmarks for a saliency prediction for movie viewing in the immersive environment, fifteen computational models are evaluated on the dataset, including a newly developed multi-stream audio-visual saliency prediction model based on deep neural networks, named as MSAV. Detailed quantitative and qualitative comparisons and analyses are also provided. The developed dataset and benchmarks could help to facilitate the studies of visual saliency prediction for movie viewing in the immersive environments. [ABSTRACT FROM AUTHOR]

Subjects :: *SHARED virtual environments
*VIRTUAL reality
*EYE tracking
*VISUAL communication
*EYE movements
*ARTIFICIAL neural networks
*DIGITAL image processing

Full Text Access

Tools