Back to Search Start Over

MultiModal Language Modelling on Knowledge Graphs for Deep Video Understanding

Authors :
Ching-Yung Lin
Vishal Anand
Ziyin Wang
Boshen Jin
Raksha Ramesh
Xiaoxiao Lei
Source :
ACM Multimedia
Publication Year :
2021
Publisher :
ACM, 2021.

Abstract

The natural language processing community has had a major interest in auto-regressive [4, 13] and span-prediction based language models [7] recently, while knowledge graphs are often referenced for common-sense based reasoning and fact-checking models. In this paper, we present an equivalence representation of span-prediction based language models and knowledge-graphs to better leverage recent developments of language modelling for multi-modal problem statements. Our method performed well, especially with sentiment understanding for multi-modal inputs, and discovered potential bias in naturally occurring videos when compared with movie-data interaction-understanding. We also release a dataset of an auto-generated questionnaire with ground-truths consisting of labels spanning across 120 relationships, 99 sentiments, and 116 interactions, among other labels for finer-grained analysis of model comparisons in the community.

Details

Database :
OpenAIRE
Journal :
Proceedings of the 29th ACM International Conference on Multimedia
Accession number :
edsair.doi...........77cb063c6944d5cbe2fad4076dfe58f5
Full Text :
https://doi.org/10.1145/3474085.3479220