A Pitch-aware Speaker Extraction Serial Network

APSIPA(2020)

引用 0|浏览3
暂无评分
摘要
Despite deep learning has an excellent performance in monaural speaker extraction, it's still a challenge to extract speakers when facing the same gender, i.e., male-male and female-female. On the other hand, it has been proved that pitch tracking is effective for same-gender speech separation. In this study, we proposed a pitch-aware speaker extraction serial network (PSESNet) to improve extraction performance. We designed a serial system and compared it with multi-task learning, we tried to use the target speaker's pitch information to optimize the loss function rather than as input to the extraction network. The extraction part uses SpeakerBeam-FE (SBF) with magnitude and temporal spectrum approximation loss (MTSAL) and speaker embedding concatenation. After extracting the spectrogram of the target speaker, we connected the spectrogram to predict the pitch information to do further optimization. Experimental results show that serial system performs better than multi-task learning and proposed method improves performance in both same and opposite gender conditions. On average, PSESNet achieves 4.7% and 3.8% relative improvements on WSJ0 dataset over the SBF-MTSAL-Concat baseline on signal-to-distortion ratio (SDR) under both closed and open condition.
更多
查看译文
关键词
extraction performance,serial system,multitask learning,target speaker,extraction network,extraction part,speaker embedding concatenation,pitch information,pitch-aware speaker extraction serial network,monaural speaker extraction,pitch tracking,magnitude and temporal spectrum approximation loss,SpeakerBeam-FE
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要