Attention-based scaling adaptation for target speech extraction

arxiv(2021)

引用 7|浏览14
暂无评分
摘要
The target speech extraction has attracted widespread attention in recent years, however, the research of improving the target speaker clues is still limited. In this work, we focus on investigating the dynamic interaction between different mixtures and the target speaker to exploit the discriminative target speaker clues. We propose a special attention mechanism in a scaling adaptation layer to better adapt the network towards extracting the target speech. Furthermore, by introducing a mixture embedding matrix pooling method, our proposed attention-based scaling adaptation (ASA) can exploit the target speaker clues in a more efficient way. Experimental results on the spatialized reverberant WSJ0 2-mix dataset demonstrate that the proposed method improves the performance of the target speech extraction significantly. Furthermore, we find that under the same network configurations, the ASA in a single-channel condition can achieve competitive performance gains as that achieved from two-channel mixtures with inter-microphone phase difference (IPD) features.
更多
查看译文
关键词
Target speech extraction,time-domain,attention,adaptation
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要