Back to Search Start Over

Temporal Aggregate Representations for Long-Range Video Understanding

Authors :
Sener, Fadime
Singhania, Dipika
Yao, Angela
Publication Year :
2020

Abstract

Future prediction, especially in long-range videos, requires reasoning from current and past observations. In this work, we address questions of temporal extent, scaling, and level of semantic abstraction with a flexible multi-granular temporal aggregation framework. We show that it is possible to achieve state of the art in both next action and dense anticipation with simple techniques such as max-pooling and attention. To demonstrate the anticipation capabilities of our model, we conduct experiments on Breakfast, 50Salads, and EPIC-Kitchens datasets, where we achieve state-of-the-art results. With minimal modifications, our model can also be extended for video segmentation and action recognition.<br />Comment: ECCV 2020, European Conference on Computer Vision

Details

Database :
arXiv
Publication Type :
Report
Accession number :
edsarx.2006.00830
Document Type :
Working Paper