Automatic Prosody Evaluation of L2 English Read Speech in Reference to Accent Dictionary with Transformer Encoder

Conference of the International Speech Communication Association (INTERSPEECH)(2022)

引用 0|浏览0
暂无评分
摘要
Automatic prosody evaluation models for second language (L2) read speech are classified into two categories: reference-based and reference-free. Reference-based models refer to native speakers' speech of the uttered text while reference-free models do not. Conventional reference-free models do not even take the uttered text into account. We propose an automatic prosody evaluation model that takes the uttered text into account by estimating native speakers' prosodic patterns using a Transformer encoder. The Transformer encoder used in Fast-Speech 2 estimates a sequence of native speakers' prosodic features in a phoneme-segment level, and a subsequent neural network module evaluates an L2 learner's utterance by comparing the sequence of prosodic features with the estimated sequence of native speakers' utterances. We evaluated the model by Spearman's correlation between the objective and subjective scores on L2 English sentence speech read by Japanese university students. The experimental results indicated that our model achieved a higher subjective-objective score correlation than that with a reference-free model and even higher than an inter-rater score correlation.
更多
查看译文
关键词
automatic prosody evaluation, second language (L2) speech, reference-free, Transformer encoder
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要