Back to Search
Start Over
See the Sound, Hear the Pixels
- Source :
- WACV
- Publication Year :
- 2020
- Publisher :
- IEEE, 2020.
-
Abstract
- For every event occurring in the real world, most often a sound is associated with the corresponding visual scene. Humans possess an inherent ability to automatically map the audio content with visual scenes leading to an effortless and enhanced understanding of the underlying event. This triggers an interesting question: Can this natural correspondence between video and audio, which has been diminutively explored so far, be learned by a machine and modeled jointly to localize the sound source in a visual scene? In this paper, we propose a novel algorithm that addresses the problem of localizing sound source in unconstrained videos, which uses efficient fusion and attention mechanisms. Two novel blocks namely, Audio Visual Fusion Block (AVFB) and Segment-Wise Attention Block (SWAB) have been developed for this purpose. Quantitative and qualitative evaluations show that it is feasible to use the same algorithm with minor modifications to serve the purpose of sound localization using three different types of learning: supervised, weakly supervised and unsupervised. A novel Audio Visual Triplet Gram Matrix Loss (AVTGML) has been proposed as a loss function to learn the localization in an unsupervised way. Our empirical evaluations demonstrate a significant increase in performance over the existing state-of-the-art methods, serving as a testimony to the superiority of our proposed approach.
- Subjects :
- Sound localization
geography
geography.geographical_feature_category
Pixel
Event (computing)
business.industry
Computer science
Speech recognition
02 engineering and technology
Function (mathematics)
010501 environmental sciences
01 natural sciences
0202 electrical engineering, electronic engineering, information engineering
Natural (music)
020201 artificial intelligence & image processing
Artificial intelligence
business
Sound (geography)
0105 earth and related environmental sciences
Gramian matrix
Block (data storage)
Subjects
Details
- Database :
- OpenAIRE
- Journal :
- 2020 IEEE Winter Conference on Applications of Computer Vision (WACV)
- Accession number :
- edsair.doi...........2fb15b2de36caacd4c86b63ef8c90ff5
- Full Text :
- https://doi.org/10.1109/wacv45572.2020.9093616