Deepfake detection based on cross-domain local characteristic analysis with multi-domain transformer

Muhammad Ahmad Amin,Yongjian Hu, Chang-Tsun Li,Beibei Liu

ALEXANDRIA ENGINEERING JOURNAL(2024)

引用 0|浏览2
暂无评分
摘要
Deepfake videos present a significant challenge in the current media landscape. While current deepfake detection methods demonstrate satisfactory performance, there is still room for improvement in their ability to generalize and detect unseen scenarios, particularly those involving imperceptible cues. This paper introduces a novel multi -modal deepfake detection model named SpectraVisionFusion Transformer (SVFT), which incorporates spatial and frequency domain statistical artifacts to improve generalization performance. The SVFT framework uses two different backbone encoder models to take advantage of both spatial and frequency domain cues in video sequences, along with a decoder and classifier, for common cross-attention and classification, respectively. The spatial domain branch uses a convolutional transformer-based encoder to analyze facial visual features. In contrast, the frequency domain branch employs a language transformer encoder. Additionally, we introduce a weighted feature embedding fusion mechanism that integrates spectral-based statistical feature embeddings and visual cues to achieve a more comprehensive and balanced spatial-frequency feature representation. By coordinately analyzing these modalities, our model exhibits improved detection and generalization capabilities in unseen scenarios. Our proposed SVFT model achieved 92.57% and 80.63% accuracy in extensive crossmanipulation and dataset evaluation, respectively, while surpassing the performance of traditional and singledomain-based approaches.
更多
查看译文
关键词
Deepfake detection,Multi-domain transformer,Generalization performance,Spectral anomalies,Spatial-frequency domains
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要