Adversarial Attacks on Speech Separation Systems.

ICMLA(2022)

引用 0|浏览4
暂无评分
摘要
Speech separation is a special form of blind source separation in which the objective is to decouple two or more sources such that they are distinct. The need for such an ability grows as speech activated device usage increases in our every day life. These systems, however, are susceptible to malicious actors. In this work, we repurpose proven adversarial attacks and leverage them against a combination speech separation and speech recognition system. The attack adds adversarial noise to a mixture of two voices such that the two outputs of the speech separation system are similarly transcribed by the speech recognition system despite hearing clear differences in the speech. Against ConvTasNet, degradation of separation remains low at 0.34 decibels, allowing the speech recognition system to still work. When testing against automatic speech recognition, the attack achieves a 64.07% word error rate (WER) against Wav2Vec2, compared to 4.22% for unmodified samples. Against Speech2Text, the WER is 84.55%, compared to 10% WER for unmodified samples. For similarity to the target transcript, the attack achieves 24.77% character error rate (CER), reduced from 113% CER. This indicates relatively high similarity between the target transcription and the resulting transcription.
更多
查看译文
关键词
machine learning, speech recognition, speech synthesis
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要