1. Not all temporal shift modules are profitable.
- Author
-
Zhang, Youshan, Li, Yong, Guo, Shaozhe, and Liang, Qiming
- Subjects
CONVOLUTIONAL neural networks ,ARTIFICIAL intelligence ,DATA mining ,VIOLENCE ,VIDEO surveillance - Abstract
With the increasing coverage of video surveillance systems in modern society, demand for using artificial intelligence algorithm to replace humans in violent behavior recognition has also become stronger. By moving some channels in the temporal dimension, temporary shift module (TSM) can achieve the performance of three-dimensional convolution neural network (CNN) with the complexity of two-dimensional CNN, and extract the temporal and spatial information at the same time. Our intuition is that too many temporary shift modules may fuse too much action information in each frame, which weakens the capability of CNN on spatiotemporal information extraction. To verify the aforementioned conjecture, we adjusted the network structure based on TSM, proposed partial TSM, selected the optimal model through experiments, and verified the performance of the algorithm on multiple datasets and our expanded datasets. The proposed optimal model not only reduced the memory usage of hardware but also achieved higher accuracy on multiple datasets with 77.3% running time. Meanwhile, we achieved state-of-the-art performance of 91% on RWF-2000 dataset. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF