Multi-condition Training and System Combination for Automatic MOS Prediction

2022 IEEE 11th Global Conference on Consumer Electronics (GCCE)(2022)

引用 0|浏览1
暂无评分
摘要
The quality of text-to-speech synthesis and voice conversion has been improved. To evaluate their quality, the mean opinion score (MOS) is widely used but its scoring requires human evaluation. Automatic MOS prediction can reduce development costs. The VoiceMOS challenge is a shared task for automatic MOS prediction in which the MOS of synthesized speech taken from the previous Blizzard challenges and voice conversion challenge is predicted. The challenge has main and out-of-domain (ood) tracks; we found that automatic MOS prediction systems show poor performance for the ood track. To improve this, we introduce multi-condition training. Experiments show that multi-condition training significantly improved the MOS prediction performance. Visualizing the embedded vectors of MOS prediction models shows that multi-condition training increased the margin between samples from two tracks.
更多
查看译文
关键词
MOS prediction,SSL-MOS,LDNet,multi-condition training,system combination
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要