Back to Search Start Over

Order-Constrained Representation Learning for Instructional Video Prediction.

Authors :
Li, Muheng
Chen, Lei
Lu, Jiwen
Feng, Jianjiang
Zhou, Jie
Source :
IEEE Transactions on Circuits & Systems for Video Technology. Aug2022, Vol. 32 Issue 8, p5438-5452. 15p.
Publication Year :
2022

Abstract

In this paper, we propose a weakly-supervised approach called Order-Constrained Representation Learning (OCRL) to predict future actions from instructional videos by observing incomplete steps of actions. Most conventional methods focus on predicting actions based on partially observed video frames, which mainly study low-level semantics such as motion consistency. Unlike performing a single action, completing a task in an instructional video usually requires several steps of action and longer periods. Motivated by the fact that the order of action steps is key to learning task semantics, we develop a new frame of contrastive loss, called StepNCE, to integrate the shared semantic information between step order and task semantics under the framework of the memory bank-based momentum-updating algorithm. Specifically, we learn the video representations from step order-rearranged trimmed video clips based on the proposed task-consistency rule and order-consistency rule. Our StepNCE loss can be used to pre-train a video feature encoder, which is then fine-tuned to carry out the instructional video prediction task. Our approach digs deeper into the sequential logic between different action steps with respect to a certain task, which is able to promote the video understanding methods to a new semantic level. We evaluate our method on five popular instructional video and action prediction datasets: COIN, CrossTask, UT-Interaction, BIT-Interaction, and ActivityNet v1.2, and the results show that our approach gains improvements from conventional prediction methods. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
10518215
Volume :
32
Issue :
8
Database :
Academic Search Index
Journal :
IEEE Transactions on Circuits & Systems for Video Technology
Publication Type :
Academic Journal
Accession number :
158333578
Full Text :
https://doi.org/10.1109/TCSVT.2022.3149329