Contribution of Timbre and Shimmer Features to Deepfake Speech Detection

2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)(2022)

引用 0|浏览2
暂无评分
摘要
Advanced deep-learning techniques can generate natural and synthetic voices that might be close to someone's voice. Nevertheless, misuse of such technologies is of great concern. Hence, researchers focus on detecting these malicious synthetic voices, called "deepfake speech." Although many feature extractions and classifications have been proposed, the accuracy of deepfake detection is still unreliable. In addition, most of the current features are computed in the frequency domain. To this end, we conducted experiments to investigate the contribution of two acoustic features and deepfake speech signals. The acoustic features are timbre and shimmer, which represent our auditory perception in the time domain. We point out that eight timbre components and four shimmer components significantly contribute to discriminating deepfake speech from genuine speech. We also propose a method for detecting deepfake speech based on these timbre and shimmer features. The method was evaluated by using a dataset from the Audio Deep Synthesis Detection Challenge (ADD 2022). The results suggest that combining these eight timbre components and four shimmer components with a simple classifier using multilayer perceptron neural networks can enable deepfake speech to be detected potentially effectively.
更多
查看译文
关键词
acoustic features,ADD 2022,Audio Deep Synthesis Detection Challenge,auditory perception,deep learning,deepfake speech detection,deepfake speech signals,feature classification,feature extractions,multilayer perceptron neural networks,natural voice generation,shimmer feature,synthetic voice generation,timbre feature
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要