MultiModal Language Modelling on Knowledge Graphs for Deep Video Understanding

Authors :: Ching-Yung Lin
Vishal Anand
Ziyin Wang
Boshen Jin
Raksha Ramesh
Xiaoxiao Lei
Source :: ACM Multimedia
Publication Year :: 2021
Publisher :: ACM, 2021.
Abstract: The natural language processing community has had a major interest in auto-regressive [4, 13] and span-prediction based language models [7] recently, while knowledge graphs are often referenced for common-sense based reasoning and fact-checking models. In this paper, we present an equivalence representation of span-prediction based language models and knowledge-graphs to better leverage recent developments of language modelling for multi-modal problem statements. Our method performed well, especially with sentiment understanding for multi-modal inputs, and discovered potential bias in naturally occurring videos when compared with movie-data interaction-understanding. We also release a dataset of an auto-generated questionnaire with ground-truths consisting of labels spanning across 120 relationships, 99 sentiments, and 116 interactions, among other labels for finer-grained analysis of model comparisons in the community.

Subjects :: business.industry
Computer science
Problem statement
computer.software_genre
Speaker diarisation
Knowledge graph
Leverage (statistics)
Artificial intelligence
Language model
Equivalence (formal languages)
business
Representation (mathematics)
computer
Natural language processing
Transformer (machine learning model)

Database :: OpenAIRE
Journal :: Proceedings of the 29th ACM International Conference on Multimedia
Accession number :: edsair.doi...........77cb063c6944d5cbe2fad4076dfe58f5
Full Text :: https://doi.org/10.1145/3474085.3479220