The malicious use of deepfake videos seriously affects information security and brings great harm to society. Currently, deepfake videos are mainly generated based on deep learning methods, which are difficult to be recognized by the naked eye, therefore, it is of great significance to study accurate and efficient deepfake video detection techniques. Most of the existing detection methods focus on analyzing the discriminative information in a specific feature domain for classification from a local or global perspective. Such detection methods based on a single type feature have certain limitations in practical applications. In this paper, we propose a deepfake detection method with the ability to comprehensively analyze the forgery face features, which integrates features in the space domain, noise domain, and frequency domain, and uses the Inception Transformer to learn the mix of global and local information dynamically. We evaluate the proposed method on the DFDC, Celeb-DF, and FaceForensic++ benchmark datasets. Extensive experiments verify the effectiveness and good generalization of the proposed method. Compared with the optimal model, the proposed method with a small number of parameters does not use pre-training, distillation, or assembly, but still achieves competitive performance. The ablation experiments evaluate the role of each component.
Copyright: © 2024 Ding et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.