ASSD: Synthetic Speech Detection in the AAC Compressed Domain

ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)(2023)

引用 0|浏览5
暂无评分
摘要
Synthetic human speech signals have become very easy to generate given modern text-to-speech methods. When these signals are shared on social media they are often compressed using the Advanced Audio Coding (AAC) standard. Our goal is to study if a small set of coding metadata contained in the AAC compressed bit stream is sufficient to detect synthetic speech. This would avoid decompressing of the speech signals before analysis. We call our proposed method AAC Synthetic Speech Detection (ASSD). ASSD extracts information from the AAC compressed bit stream without decompressing the speech signal. ASSD analyzes the information using a transformer neural network. In our experiments, we compressed the ASVspoof2019 dataset according to the AAC standard using different data rates. We compared the performance of ASSD to a time domain based and a spectrogram based synthetic speech detection methods. We evaluated ASSD on approximately 71k compressed speech signals. The results show that our proposed method typically only requires 1000 bits per speech block/frame from the AAC compressed bit stream to detect synthetic speech. This is much lower than other reported methods. Our method also had a 9.7 percentage points higher detection accuracy compared to existing methods.
更多
查看译文
关键词
Synthetic speech detection,speech forensics,compressed speech,metadata,transformer
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要