First Person Action Recognition via Two-stream ConvNet with Long-term Fusion Pooling.

Authors :: Kwon, Heeseung
Kim, Yeonho
Lee, Jin S.
Cho, Minsu
Source :: Pattern Recognition Letters. Sep2018, Vol. 112, p161-167. 7p.
Publication Year :: 2018
Abstract: Highlights • We propose a novel two-stream ConvNet model for first person action recognition. • Our model effectively captures the temporal structure of actions. • We introduce novel long-term pooling operators for appearance and motion information. • We analyze the effect of the proposed architecture with extensive experiments. • We achieve the state-of-the-art performance for first person action recognition. Abstract First person action recognition is an active research area with increasingly popular wearable devices. Action classification for first person video (FPV) is more challenging than conventional action classification due to strong egocentric motions, frequent changes of viewpoints, and diverse global motion patterns. To tackle these challenges, we introduce a two-stream convolutional neural network that improves action recognition via long-term fusion pooling operators. The proposed method effectively captures the temporal structure of actions by leveraging a series of frame-wise features of both appearance and motion in actions. Our experiments validate the effect of the feature pooling operators, and show that the proposed method achieves state-of-the-art performance on standard action datasets. [ABSTRACT FROM AUTHOR]

Subjects :: *ARTIFICIAL neural networks
*IMAGE databases
*CAMERAS
*VIDEO surveillance
*IMAGE segmentation

Full Text Access

Tools