Are you Really Alone? Detecting the use of Speech Separation Techniques on Audio Recordings

2023 IEEE INTERNATIONAL WORKSHOP ON INFORMATION FORENSICS AND SECURITY, WIFS(2023)

引用 0|浏览3
暂无评分
摘要
The pervasive influence of digital media has brought about new challenges in verifying the authenticity and integrity of audio recordings. The ease of editing and altering audio has raised concerns regarding the potential malicious use of speech separation techniques, where multiple speakers' voices can be extracted from a mixed recording. In light of these emerging threats, the need for robust forensic detectors that can identify the presence of speech separation forgeries becomes increasingly crucial. In this paper, we propose a novel forensic detector designed to discern between original single-speaker speech recordings and those obtained using speech separation techniques applied to audio recordings containing multiple speakers. Leveraging the power of Convolutional Neural Networks (CNNs), we explore the efficacy of different Short-Time Fourier Transform (STFT) representations in tackling the task. While many conventional approaches in the literature employ the audio spectrogram (i.e., the STFT magnitude) as input for CNNs, our study explores the use of the STFT real and imaginary parts, as well as the STFT magnitude and phase. In doing so, we ensure the preservation of all essential information embedded within the speech signal. Results show that the proposed signal representation improves over the sole use of the spectrogram. Moreover, the proposed approach is able to generalize to datasets and speech separation techniques never seen in training. Finally, our proposed detector shows promising results on preliminary experiments performed on synthetically generated audio tracks.
更多
查看译文
关键词
Forensics,Audio,Speech,Speech Separation
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要