With the advancement of deep forgery techniques, particularly propelled by generative adversarial networks (GANs), identifying deepfake faces has become increasingly challenging. Although existing forgery detection methods can identify tampering details within manipulated images, their effectiveness significantly diminishes in complex scenes, especially in low-quality images subjected to compression. To address this issue, we proposed a novel deep face forgery video detection model named Two-Stream Feature Domain Fusion Network (TSFF-Net). This model comprises spatial and frequency domain feature extraction branches, a feature extraction layer, and a Transformer layer. In the feature extraction module, we utilize the Scharr operator to extract edge features from facial images, while also integrating frequency domain information from these images. This combination enhances the model's ability to detect low-quality deepfake videos. Experimental results demonstrate the superiority of our method, achieving detection accuracies of 97.7%, 91.0%, 98.9%, and 90.0% on the FaceForensics++ dataset for Deepfake, Face2Face, FaceSwap, and NeuralTextures forgeries, respectively. Additionally, our model exhibits promising results in cross-dataset experiments.. The code used in this study is available at: https://github.com/hwZHc/TSFF-Net.git.
Copyright: © 2024 Zhang et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.