1. Deep learning in video anomaly detection and its applications
- Author
-
Zhao, Yuxuan
- Abstract
With the popularization of the city monitoring system, surveillance videos have been increasingly presented. Traditional methods for video analytic require professionals to monitor the video constantly to find out abnormal events, which leads to a tough and time-consuming task. Therefore, research activities on automatic video anomaly detection are of great practical significance since a feasible detection technique can reduce the large amount of human resources used for monitoring videos. This thesis presents several novel deep learning methods for video anomaly detection. In addition, it provides a potential system for the application of these methods and extension of the video sources. Video anomaly detection is a problem of detecting and classifying anomalies in videos. Anomaly refers to an unusual event or emergency that deviates from what is standard, normal and expected. The kernel of video anomaly detection is the extraction of spatial and temporal features. Proposed methods in this thesis are all based on the two-stream structure. This structure allows two streams with different inputs. The input is sampled frames for the first stream, while the second stream needs optical flow as its input. The traditional two-stream model extracts spatial features only from the first stream. In addition, all temporal streams are captured and handled from the stream of optical flow. Some significant improvements have been implemented in our models. For the spatial features, since convolutional neural networks (CNN) have been proved to have a good performance, we keep the convolutional structure and replace the basic CNN with advanced CNN (DenseNet). For the temporal features, proposed methods try to extract them from both two streams. In the first stream, 3D convolution (C3D) and Long Short-Term Memory (LSTM) are used to handle a sequence of frames. In the second stream, we implement the DenseNet Structure to improve the performance. These modifications make the whole model too complicated such that the training process would be complex. Therefore, the clip-based video processing method is designed to enhance the efficiency of the training process and reduce the pressure of computation. Experiments are conducted to validate the performance of the proposed two-stream methods with comparisons along several well-known videos related to deep learning models. UCF-101 is used to evaluate the general performance of models. FIRESENSE Dataset and UCF-Crime are used to test the performance of video anomaly detection tasks. We also collected a merged dataset to simulate the anomaly detection. Proposed models perform well on all these datasets.
- Published
- 2021
- Full Text
- View/download PDF