Back to Search Start Over

ViXNet: Vision Transformer with Xception Network for deepfakes based video and image forgery detection.

Authors :
Ganguly, Shreyan
Ganguly, Aditya
Mohiuddin, Sk
Malakar, Samir
Sarkar, Ram
Source :
Expert Systems with Applications. Dec2022, Vol. 210, pN.PAG-N.PAG. 1p.
Publication Year :
2022

Abstract

With the advent of image generative technologies, there is a huge growth in the development of facial manipulation techniques that allow people to easily modify media data like videos and images by changing the identity or facial expression of the target person with another person's face. Colloquially, these manipulated videos and images are termed "deepfakes". As a result, every piece of content in digital media comes with a question — is this authentic? Hence, there is an unprecedented need for a competent deepfakes detection method. The rapid changes in forging methods make this a very challenging task and thus generalization of the detection methods is also of utmost required. However, the generalization strengths of the prevailing deepfakes detection methods are not satisfactory. In other words, these models perform well when trained and tested on the same dataset but fail to perform satisfactorily when models are trained on one dataset and tested on another. The most modern deep learning aided deepfakes detection techniques looked for a consistent pattern among the leftover artifacts in specific facial regions of the target face rather than the entire face. To this end, we propose a Vision Transformer with Xception Network (ViXNet) to learn the consistency of these almost imperceptible artifacts left by deepfaking methods on the entire facial region. The ViXNet comprises two branches — one tries to learn inconsistencies among local face region specifics by combining patch-wise self-attention module and vision transformer, and the other generates global spatial features using a deep convolutional neural network. To assess the performance of ViXNet, we evaluate it using two different experimental setups — intra-dataset and inter-dataset when using three standard deepfakes video datasets, namely FaceForensics++, and Celeb-DF (V2) and one deepfakes image dataset called Deepfakes. We have attained 98.57% (83.60%), 99.26% (74.78%), and 98.93% (75.13%) AUC scores using intra(inter)-dataset experimental setups on FaceForensics++, Celeb-DF (V2), and Deepfakes datasets respectively. Additionally, we have evaluated ViXNet on the Deepfake Detection Challenge (DFDC) dataset and we have obtained 86.32% AUC score and 79.06% F1-score on the said dataset. Performances of the proposed model are comparable to state-of-the-art methods. Besides, the obtained results ensure the robustness and the generalization ability of the proposed model. • Proposed a deep learning based model for deepfake image/video detection. • It has a patch-wise self-attention module which learns local image artifacts. • It consists of a vision transformer which learns correlation among masked patches. • Xception based global image features are stacked with patch based local features. • The model achieves good results on some standard video forgery detection datasets. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
09574174
Volume :
210
Database :
Academic Search Index
Journal :
Expert Systems with Applications
Publication Type :
Academic Journal
Accession number :
159432413
Full Text :
https://doi.org/10.1016/j.eswa.2022.118423