Back to Search
Start Over
MultiModal Language Modelling on Knowledge Graphs for Deep Video Understanding
- Source :
- ACM Multimedia
- Publication Year :
- 2021
- Publisher :
- ACM, 2021.
-
Abstract
- The natural language processing community has had a major interest in auto-regressive [4, 13] and span-prediction based language models [7] recently, while knowledge graphs are often referenced for common-sense based reasoning and fact-checking models. In this paper, we present an equivalence representation of span-prediction based language models and knowledge-graphs to better leverage recent developments of language modelling for multi-modal problem statements. Our method performed well, especially with sentiment understanding for multi-modal inputs, and discovered potential bias in naturally occurring videos when compared with movie-data interaction-understanding. We also release a dataset of an auto-generated questionnaire with ground-truths consisting of labels spanning across 120 relationships, 99 sentiments, and 116 interactions, among other labels for finer-grained analysis of model comparisons in the community.
- Subjects :
- business.industry
Computer science
Problem statement
computer.software_genre
Speaker diarisation
Knowledge graph
Leverage (statistics)
Artificial intelligence
Language model
Equivalence (formal languages)
business
Representation (mathematics)
computer
Natural language processing
Transformer (machine learning model)
Subjects
Details
- Database :
- OpenAIRE
- Journal :
- Proceedings of the 29th ACM International Conference on Multimedia
- Accession number :
- edsair.doi...........77cb063c6944d5cbe2fad4076dfe58f5
- Full Text :
- https://doi.org/10.1145/3474085.3479220