Relationship Between Speakers' Physiological Structure and Acoustic Speech Signals: Data-Driven Study Based on Frequency-Wise Attentional Neural Network

2022 30th European Signal Processing Conference (EUSIPCO)(2022)

引用 1|浏览32
暂无评分
摘要
Quantitatively revealing the relationship between speakers' physiological structure and acoustic speech signals by considering the properties of resonance and antiresonance can help us to extract effective speaker discriminative information (SDI) from speech signals. The conventional quantification method based on F-ratio only considers the power of acoustic speech in each frequency band independently. We propose a novel frequency-wise attentional neural network to learn the nonlinear combined effect of the frequency components on speaker identity. The learned results indicate that antiresonance frequency induced by the nasal cavity is another essential factor for speaker discrimination that the F-ratio method could not reveal. To further evaluate our findings, we designed a non-uniform subband processing strategy based on the learned results for speaker feature extraction and did automatic speaker verification (ASV). The ASV results confirmed that further emphasizing the spectral structure around the antiresonance frequency region can enhance speaker discrimination.
更多
查看译文
关键词
physiological feature,non-uniform filterbank,frequency-wise attention,data-driven feature
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要