Malicious face manipulation has a negative impact on social security and stability, and it is a very important issue to accurately detect video images after face tampering. In order to solve the problem of poor real-time performance of video manipulation detection model, this paper proposes a face manipulation video detection model based on ensemble learning dual-stream recurrent neural network, and introduces the voting mechanism in ensemble learning. The model first receives a small number of consecutive sequence frames, extracts spatial features through a convolutional neural network, and introduces central differential convolution to enhance tampering artifacts in the spatial domain. The model then differentiates consecutive sequence frames to enhance tampering artifacts in the temporal domain, while temporal feature extraction is performed through a convolutional neural network. Then, the model splices the dual-stream feature vectors in the spatial domain and the time domain, and performs feature extraction through a recurrent neural network. During the feature extraction process of the recurrent neural network, the frame-by-frame feature information is retained as the input of the subsequent auxiliary frame-level classifier, while the final output of the recurrent neural network is used as the input of the video-level discriminator. Finally, the model introduces the voting mechanism of the integrated model to integrate the outputs of multiple auxiliary frame-level discriminators and video-level discriminators, and introduces a weight hyperparameter γ to balance the importance of the auxiliary framelevel discriminator and video-level discriminator, helping the model to improve detection accuracy. On the FaceForensics++ dataset, the experimental results show that the proposed model improves the average accuracy by 0.4% and 1.0% compared with mainstream detection model. At the same time, the proposed model can only use fewer consecutive frames for manipulation detection, which improves the real-time performance of the model. [ABSTRACT FROM AUTHOR]